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CHAPTER  I 


INTRODUCTION 

The  purpose  of  discrete  event  simulation  is  to  study  systems 
which  change  at  discrete  points  in  time.  Doing  discrete  simulation 
with  a computer  requires  methods  of  creating,  storing,  retrieving 
and  consuming  (destroying)  information  within  the  computer  during 
the  course  of  time.  The  consequence  of  this  dynamic  information 
processing  is  that  a major  problem  arises  in  the  generally  exces- 
sive length  of  computer  time  needed  to  conduct  discrete  simula- 
tions (_ 2 2 J . The  solution  to  this  problem  involves  utilizing 
efficient  methods  for  non -numerical  dynamic  information  processing, 
of  which  the  major  task  is  the  efficient  manipulation  of  priority 
queues,  not  only  because  priority  queues  are  touiu  In  the  type  of 
system  studied  in  discrete  simulation,  but,  more  importantly, 
because  priority  queues  are  the  method  used  to  move  discrete 
simulations  through  time  so  that  they  can  represent  time  varying 
syscems. 

Specifically  the  research  considers  alternative  forms  of 
associative  memories  which  may  be  used  to  implement  priority 
queues  found  in  discrete  simulation.  A comparison  is  then  made 
with  a random  access  memory  to  determine  under  what  conditions  a 
particular  memory  is  more  efficient  for  a particular  priority 
queue.  The  consideration  of  associative  memories  requires  that 
the  concept  of  data  parallelism  be  considered--that  is,  the  number 


1 


of  Items  and  the  amount  of  information  within  each  one  that  can  be 


acted  upon  simultaneously  by  a single  computer  instruction. 

The  overall  approach  used  in  the  research  Is  to  postulate  in 
parametric  form  four  computer  architectural  mode  1 s--three  associa- 
tive and  one  random  access.  overlaid  on  these  models  are  parametric 
representations  of  the  priority  queues  used  in  discrete  simulation, 
fhe  total  computer  time  required  for  each  architecture  to  perform 
each  priority  queue  computational  task  is  then  determined.  This 
is  followed  by  a comparison  of  the  performance  of  the  various 
architectural  types  in  terms  of  comparing  total  time  for  similar 
tasks . 

The  research  draws  on  four  major  areas  of  previous  research. 

The  first  area  is  discrete  simulation.  The  best  single  reference 
for  the  general  state  of  the  art  in  discrete  simulation  is 
Fishman  (_ 2 2 J . He  covers  both  the  numeric  and  non-numeric  processes 
of  discrete  simulation  and  gives  special  attention  to  the  analysis 
of  discrete  simulations. 

A subset  of  discrete  simulation  is  time  flow  mechanisms,  the 
manner  in  which  discrete  simulations  move  through  time.  In  this 
area  there  are  several  authors  who  have  studied  more  efficient 
implementations.  Their  research  is  discussed  in  chapter  11. 

The  second  area  involves  general  random  access  memory 
algorithmic  procedures.  The  authority  in  this  area  is  knuth,  in 
his  tour-volume  The  Art  of  Computer  Programming  i 36,  ?,  18,  39  j. 

His  work  is  heavily  referenced  in  Chapter  II  and  Chapter  III. 
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l'he  third  area  is  associative  architecture 


The  two  authors 


in  this  area  upon  whom  this  research  relies  quite  heavily  are  Feng 
|_  1 7 , 18,  19,  20,  2l]  for  associative  algorithmic  procedures  and 
tlynn  2 3 ~)  for  methodology,  measurements  and  perspective,  ihe  work 
of  these  authors  and  others  is  discussed  in  Chapters  1J  and  III, 

[he  fourth  area  deals  with  the  intersection  of  the  previous 
three  and  is  the  focus  of  this  research.  ['here  are  four  pieces  of 
research  in  this  area,  which  are  discussed  chronologically.  The 
first  was  the  doctoral  thesis  of  Kroft  In  I9o9  [_41  ”).  He  designed 
and  demonstrated  an  associative  memory  for  list  processing.  List 
processing  is  very  close  to  the  priority  queue  processes  used  in 
discrete  simulation,  and  much  of  the  early  work  in  discrete  simu- 
lation was  a direct  result  of  list  processing  techniques.  The 
second  was  a paper  by  Posdamer  and  others  in  1971  (_60J  which 
advanced  several  ideas  regarding  the  use  of  associative  architec- 
ture for  discrete  simulation.  the  remaining  two  were  doctoral 
theses  and  both  were  published  in  1972.  [he  first,  by  Detiore  |_12], 
was  concerned  with  the  use  of  the  associative  memory  for  data 
management  purposes.  His  work  involved  the  use  of  non-numeric 
processes,  primarily  in  the  area  of  fixed  data  relationships  as  op- 
posed to  dynamic.  The  last  work,  by  Uavis  11  , was  concerned  with 

discrete  simulation  from  the  point  of  view  of  reducing  the  amount 
of  time  spent  in  branching  within  various  algorithms.  His  overall 
scheme  did  include  an  associative  memorv,  but  the  main  thrust  is 
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toward  fully  parallel  processing.  In  the  specific  area  of  priority 
queues  and  associative  memories  no  previous  detailed  research  work 
is  known. 

The  dissertation  is  composed  of  five  chapters.  Following  the 
Introduction,  background  information  in  discrete  simulation  is 
provided  in  Chapter  II.  The  computer  research  models  and  tasks 
are  discussed  in  Chapter  III.  The  results  are  covered  in  Chapter  IV 
and  conclusions  and  recommendations  are  in  Chapter  V.  Two  appen- 
dices are  also  included  which  cover  introductory  material  on 
associative  processing  and  detailed  non-numeric  algorithms  used 


in  the  research 


CHAPTER  II 


DISCRETE  SIMULATION 


2.1.  Introduction 

This  chapter  is  intended  to  introduce  the  subject  and  describe 
the  functions  that  are  involved  in  discrete  simulation.  Tutorial 
material  along  with  examples  can  be  found  in  the  references  from 
which  the  material  in  this  chapter  was  taken  [_16,  22,  26,  53j. 

Discrete  simulation  as  used  in  this  research  deals  with  the 
study  of  a dynamic  system  whose  behavior  can  be  represented  as  a 
time  sequence  of  discrete  changes  which  occur  according  to  some 
stochastic  process.  Discrete  simulation  does  not  deal  with  the 
system  directly  but  with  an  abstraction  of  the  system  called  a 
model.  The  model  and  the  means  of  moving  it  through  time  and 
measuring  its  behavior  (the  simulation  mechanism)  are  represented 
in  the  computer.  The  complexity  of  the  computer  representation  is 
a function  of  the  complexity  of  the  problem  and  the  system  under 
study. 

Included  in  this  chapter  is  a discussion  of  discrete  simula- 
tion methodology,  the  functions  involved  in  discrete  simulation, 
the  manner  in  which  the  functions  become  computer  representations, 
and,  in  detail,  one  of  the  prime  functions  that  permit  a discrete 
simulation  to  move  through  time  in  a computer  environment.  As  such 
the  general  background  to  computational  discrete  simulation  is 
provided  to  serve  as  a preamble  to  the  specific  research. 
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Discrete  sinulation  methodology  is  shown  schematically  in 
figure  1.  A discussion  of  methodologv  is  included  here  not  only 
to  provide  additional  Information  on  discrete  simulation  l>ut  also 
to  put  the  intended  research  into  better  perspective  by  showing 
specifically  where  this  research  applies. 

The  first  step  in  the  methodology  is  to  formulate  the  problem. 
It  is  at  this  step  that  a decision  is  made  whether  or  not  to  use 
discrete  simulation  as  the  problem  solving  method.  Careful  inves- 
tigation is  required  before  a commitment  because  simulation  should 
only  be  used  if  an  analvtical  technique  can  not  be  applied  eco- 
nomically to  the  problem.  Formulation  o.  the  problem  is  usually  ^ 
augmented  by  the  data  collection  step.  The  data  are  initially 
used  for  the  decision  and,  if  the  choice  is  discrete  simulation, 
they  are  used  tor  model  formulation  (including  parameter  estima- 
tion) and  evaluation.  I'his  last  step  is  a paper  and  pencil  nitra- 
tion where  the  researcher  attempts  to  formulate  what  he  considers 
to  be  an  accurate  abstraction  (model)  of  a real  or  proposed  system. 
Accurate  is  used  here  in  the  sense  that  the  model  must  properly 
imitate  the  system  tor  the  purpose  desired  and  hence  must  be 
sufficiently  complete.  If  the  model  is  too  complete,  going 
into  too  much  detail,  there  will  be  excessive  cost  incurred 
in  model  iinplementat  ion,  validation,  and  execution.  [’he  com- 
pleteness of  the  model  then  becomes  a matter  of  judgment , 
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which  is  one  of  the  reasons  that  discrete  simulation  is  more  of  an 
art  than  an  exact  science. 

The  next  step,  model  implementation,  requires  the  researcher 
to  transfer  his  |W|)er  model  to  the  computer.  This  usually  involves 
implementing  a complex  computer  program  where  special  attention 
must  be  paid  to  sequencing  and  describing  the  model.  This  task  is 
somewhat  facilitated  by  the  availability  of  discrete  simulation 
packages  and  languages  |_5 , 35,  61,  b2,  o7~j.  This  step  and  the 
subsequent  one  of  model  validation  are  particularly  difficult, 
since  the  researcher  must  Insure  that  the  model  behavior  is  correct 
with  respect  to  the  original  system.  fhe  correctness  of  the  model 
can  sometimes  be  validated  by  comparing  selected  outputs  with  the 
original  system;  however,  a frequent  use  of  discrete  simulation  is 
to  studv  a system  that  is  in  the  planning  stage  or  to  study  some 
aspect  of  an  existing  system  for  which  there  is  little  information. 

Assumim'  that  the  preceding  steps  have  been  successfully 
completed,  • h>  balance  ot  t't  m - 1 odol  oc  v i'  concerned  with  the 
formal  d.  siin  ot  ixperinxnts,  data  generation,  and  data  analysis, 
with  tin  report  as  the  flna1  s'  » p.  .Ik  feedback  lints  in  figure  l 
ret  Wet  r ht  Iterarivt  nature  ot  the  rot>>l  methodology.  fhe 
rt  starsh  it  s»  It  t f t « . t s those  steps  which  involve  tin-  computer 
and  , in  part  i > u 1 ar , iw«li  I • x>  t ut  i on . 

? . i . u m ' i on  i 1 K«  qu  i rt  mt  nt 

Hit  functional  requirements  (or  di  rite  simulation  are  shown 
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in  Figure  2.  These  requl rements  are  divided  into  control  and 
support  functions.  The  control  functions  are  synonymous  with 
the  simulation  mechanism.  Many  of  these  ideas  are  discussed  by 
Pritsker  [62], 

Cont rol  Funct ions 

Fq  - Discrete  Simulation  Sequencer.  The  role  of  the 
discrete  simulation  sequencer  is  to  move  the  simulation  through 
the  various  control  functions  or  states  according  to  the  require- 
ments of  the  user.  This  function  may  not  exist  as  a separate 
program  and  most  commonly  occurs  as  a linkage  among  the  other 
control  functions. 

Fj  - Initialization.  The  initialization  function  sets  up 
the  initial  conditions  for  both  the  control  functions  and  the 
model.  This  is  referred  to  in  the  literature  as  setting  up  the 
model  or  simulation  runs  and  amounts  to  setting  up  all  the  data 
values  used  to  start  the  simulation. 

Fj  - Time  Flow  Mechanism  (TFM).  The  time  flow  mechanism  has 
three  nominal  purposes:  maintaining  the  simulated  time  or  clock, 

selecting  the  next  potential  happening  within  the  model,  and 
maintaining  a list  of  future  potential  happenings  (_52,  71  ].  The 
TFM  does  not  necessarily  cause  a state  change  within  a model, 
but  creates  the  possibility  of  a state  change  according  to  pre- 
programmed instructions.  The  method  used  to  Implement  this 
function  is  critical  to  the  computational  efficiency  of  the 
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simulation  and  is  problem  dependent  [8,  UU,  50,  7 1 J . I'he  normal 
process  for  the  rFh  is  to  select  a potential  next  state  change 
and  then  activate  that  part  of  the  model  description  program 
responsible  for  determining  whether  or  not  that  particular  state 
change  can  take  place  at  this  time.  If  it  can,  the  state  change 
is  implemented  by  the  model  description  function.  At  the  comple- 
tion of  the  state  change,  control  is  returned  to  the  fFw  by  the 
model  description  function  along  with  a list  of  future  potential 
state  changes.  i'he  TFM  places  any  new  potential  state  changes  in 
its  main  list,  possibly  sorting  them  on  time,  updates  the  clock 
if  appropriate,  and  then  chooses  the  next  potential  state  change 
which  starts  the  TFM  cycle  again.  Therefore  a discrete  simulation 
exists  by  virtue  of  the  cooperation  between  the  TFM  and  the  model 
description  function. 

Fj  - Program  Monitoring  and  Trace.  This  function  is  activated 
only  at  the  preprogrammed  request  of  the  researcher,  either  to 
repcrt  exceptional  conditions  during  the  course  of  the  simulation 
or  to  provide  snapshots  of  a partial  time  history  of  the  model. 

Its  primary  use  is  as  a run  time  diagnostic  tool. 

F^  - Data  Collection.  i'he  data  collection  function  keeps 
track  of  the  number  of  times  and  amount  of  time  that  the  model  is 
in  any  one  of  a number  of  preselected  states.  This  function  makes 
available  the  statistical  data  necessary  tor  the  design  of 


experiments  analvsis. 


F^  - Report  Generator.  the  report  generator  supplies  a 
history  of  the  discrete  simulation  including  any  statistical  output 
requested.  Its  primary  function  Is  to  support  the  data  generation 
phase  of  the  simulation. 

Support  Funct ions 

F.  - Model  Description.  This  function  contains  all  the 

o 

information  necessary  to  imitate  the  actual  system  accurately. 

The  representation  of  this  function  within  the  computer  is  quite 
difficult  and  there  appears  to  be  no  single  best  way  to  do  it. 

F 7 - Mathematical  Programs.  The  prime  role  of  this  function 
is  to  provide  the  random  numbers  and  variates  necessary  to 
reproduce  the  stochastic  nature  of  the  system.  Other  programs 
may  involve  trigonometric  formulae  and  minimums  and  maximums. 

Fu  - Data  Analysis.  This  function  includes  all  the  activities 
necessary  to  reduce  the  statistical  data  produced  by  the  discrete 
s i mu  1 a t i on . 

These  statistical  activities  would  normally  include  time 
series  analysis  L 3 ] , design  of  experiments  L 1 3 , 33,  34j,  and  other 
special  activities.  I'hese  activities  are  not  normally  a part  of 
a simulation  package  or  language  but  are  handled  separately  after 
data  collection  has  taken  place. 

- Data  Management.  This  function  is  concerned  with  the 
efficient  management  of  data  stored  in  the  computer.  This  function 
mav  exist  explicitly,  as  in  data  management  systems  or  list 


processing  languages,  or  Implicitly  within  other  functions.  It 


Is  the  former  that  will  be  used  In  this  research  where  each  other 
function  Is  considered  to  have  one  or  more  complex  data  activities. 
Later  this  function  will  be  referred  to  as  node  management. 

It  should  be  clear  that  with  the  possible  exception  of  the 
TFM  and  model  description  functions  there  Is  nothing  specific 
to  discrete  simulation  that  might  not  be  used  In  other  computer 
applications. 

2.4.  Information  Structures 

There  are  three  major  factors  that  must  be  considered  In 
moving  from  a paper  and  pencil  representation  to  a computer 
implementation  of  the  problem.  These  are  the  data  (amount  and 
type),  the  data  structure  (data  organization  or  relationships), 
and  the  operations  (functions)  germane  to  the  data  structures. 

When  these  three  items  are  represented  within  a particular  computer 
they  are  referred  to  as  the  data,  the  storage  structure  and  the 
algorithms,  discussed  below.  Several  authors  have  considered  the 
general  and  specific  problems  associated  with  the  efficient  combi- 
nation of  the  three  items  (_  1 , 6,  13,  14,  69];  however,  in  more 
recent  years  many  authors  l_22,  34]  have  turned  to  Knuth  (_36,  38,  39] 
as  the  reference  authority  on  such  matters.  In  particular, 

Fishman  []22 ] points  out  that  the  operations  and  data  structure 
considered  as  the  building  blocks  for  discrete  simulation  are 
presented  in  Knuth.  It  is  for  this  reason  and  others  discussed 
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later  that  knuth  wl  1 1 serve  as  the  principal  reference  for  this 
work , 

Before  Knuth  can  be  used  as  the  direct  reference  it  is 
necessary  to  make  some  definitions  and  based  on  these  definitions 
relate  Knuth's  work  to  other  references,  Chapin  ib]  and  D'Imperio 
i_  1 3 J make  a definite  distinction  between  a data  structure  and  a 
storage  structure.  1'hey  point  out  that  a data  structure  is  a way 
of  looking  at  or  arranging  the  relationships  among  the  data  so  that 
the  resulting  structure  'makes  sense'  with  respect  to  the  problem 
at  hand.  In  terms  of  discrete  simulation  methodology,  this  corre- 
sponds to  the  model  formulation  step  where  the  model  description 
function  is  initially  set  up.  The  formulation  of  a data  structure 
then  is  a precomputer  activity.  Once  a data  structure  is  established 
it  must  be  mapped  onto  the  existing  storage  system  within  the 
computer.  This  storage  system  may  differ  from  computer  to  computer 
and  may  consist  of  core  memory,  magnetic  tape  or  others.  After 
the  mapping,  the  data  structure  is  then  referred  to  as  a storage 
structure.  Depending  on  the  storage  .system,  there  may  be  several 
alternative  storage  structures  tor  a particular  data  structure, 
the  selection  of  a particular  storage  structure  is  based  on  the 
operations  to  be  carried  out  as  part  of  the  problem  solution. 

These  operations  correspond  in  the  most  general  sense  to  the  various 
functions  involved  in  discrete  simulation  discussed  in  the  previous 
section.  In  particular,  the  operations  supporting  the  data 
management  function  in  the  management  of  various  storage  structures 
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are  of  interest  in  this  research.  implementing  a particular 
function  may  require  one  or  more  algorithms.  in  the  case  of  data 
management  in  discrete  simulation,  several  algorithms  are  involved. 
Since  an  algorithm  is  simply  a formal  procedure  it  is  possible  for 
it  to  exist  outside  a computer;  however,  in  this  research  it  will 
be  assumed  that  it  is  a computer  procedure, 

knuth  integrates  the  dual  concept  of  data  structure  and 
operation  into  the  single  concept  of  an  information  structure.  An 
information  structure  then  has  form  (data  structure;  and  purpose 
(function).  The  function  can  be  represented  by  operations  upon  the 
data  structure.  Ultimately  an  information  structure  exists  within 
the  computer  as  a storage  structure  and  one  or  more  algorithms. 

An  efficient  implementation  on  a computer  of  an  information  struc- 
ture then  requires  matching  the  storage  structure  with  the 
algorithms,  which  in  general  is  both  problem  and  computer  dependent. 

Storage  St ructu res 

Storage  structures  can  be  classified  into  linear  and  nonlinear 
structures,  according  to  Knuth  [_36  ].  The  linear  structures  can 
be  further  partitioned  into  linear  lists  and  linear  arrays,  where 
the  latter  is  an  extension  of  the  former.  Fishman  (_ 2 2 J , Gordon  |_26], 
and  Emshoff  and  Sisson  [_16l  Illustrate  that  the  linear  list  is 
currently  the  major  storage  structure  used  in  discrete  simulation. 

As  such,  it  will  be  the  main  concern  of  this  research. 


The  following  is  knuth' s definition  of  a linear  list: 


A linear  l i sc  is  a sec  of  nX)  nodes  XU),  X(2), 

X(n)  whose  structural  properties  essentially 
involve  only  the  linear  (one  dimensional)  relative 
positions  of  the  nodes:  the  facts  that,  if  nX,  X(l) 
is  the  first  node;  when  Kk<n,  the  kth  node  X(k) 
is  preceded  by  X(k-l)  and  followed  by  X(k+1);  and 
X(n)  is  the  last  node.  j_36  ] 

A node  is  the  basic  comj>onent  of  a storage  structure,  and 
consists  of  one  or  more  words  ol  computer  memory.  i'hi  words  are 
c ntieuous  and  each  is  partitioned  into  named  (>a  rt  s called  fields. 

There  are  three  common  storage  structures  used  to  represent 
linear  lists,  at  least  within  conventional  computer  memories. 

These  are  sequential,  single  linked,  and  double  linked.  examples 
are  shown  in  figure  5.  'he  first  method  stores  information 
sequentially  within  the  computer.  The  next  two  methods  p<rmit 
nodes  to  be  stored  randomly  within  file  memory,  hut  the  list, 
instead  of  being  maintained  by  contiguity,  is  maintaimd  bv  links 
which  are  implanted  within  each  node.  In  the  singly  linked  case 
the-  link  to  the  second  node  in  the  list  is  stored  in  the  first  node 
and  so  forth.  The  links  in  t his  case  arc  simply  the  im-morv  addresses 
of  subsequent  nodes.  In  the  doubly  linked  case  each  nodi  maintains 
the  link  or  address  of  both  its  predecessor  and  successor  in  this 
list,  in  both  the  tinke-d  cases  the  lists  can  be  made  circular,  so 
that  the  last  node  in  the  list  has  the  address  of  the  lirst  nodi. 

This  permits  com.  inuous  access  throughout  uhi  list.  The  circular 
double  lint  d linear  list  is  of  particular  interest  litre  because 
it  is  the  oni  most  o.tir,  used  to  implcnx  nt  tru  in:  ormat  i on  structures 
encountered  in  discrete  simulation  |_!o,  s2 , 2 (>,  i6  . 
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V'  pe  ra t j on  s /A  1 g or  i t h ms 

there  are  nine  general  oi>erations  that  can  be  performed  on 
linear  lists.  these  are  shown  in  Cable  1 and  are  taken  from 
Anuth  |_8bJ.  Although  all  the  operations  might  be  used  in  discrete 
simulation,  some  are  used  more  heavily  than  others.  these  are 
discussed  below  and  arc  noted  by  an  asterisk  in  the  table. 

Che  insertion  operation  involves  putting  a node  into  a linear 
list.  If  the  list  uses  a sequential  storage  .tjin  rurt,  then  all 
the  nodes  below  the  node  to  be  inserted  must  be  moved  down  one 
node  position.  If  the  list  is  represented  by  a linked  storage 
structure,  insertion  is  the  process  of  establishing  the  proper 
linkage  with  the  predecessor  and  successor  nodes  within  the  list. 

Che  deletion  operation  is  the  companion  operation  to  insertion, 
here  .1  node  is  removed.  If  the  node  was  from  a list  usinc  a 
sequent  ial  storage  structure1  all  the  nodes  below  it  must  bo  moved 
up  one  node.  If  the  list  is  linked  the  predecessor  and  successor 
nodes  must  be  r<  linked  after  the  removal  operation. 

In  both  the>  above  operations,  if  is  necessary  to  know  the 
exact  location  of  the  node , Che  location  can  be  determined 
bv  pretacinv  the  insertion  with  a sort  opt  r it  ion  which  ranks  the 
nodes  aecordin  to  the  values  in  some  specified  field,  based  on 
•his  ordering  the  hc*‘  node  ; ield  value  is  usually  less  t!  an  or 
reatcr  than  that  of  the  node  to  be  inserted,  tor  tie  lotion  it 
might  onlv  be  necessary  to  remove  the  first  or  last  nodt  in  a 
ranker!  list  so  that  , i determine  1 , It  th<  e'clet  ion  is  to  lie  ot 
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Gain  access  to  the  Kth  node  of  the  list  to 
examine  or  change  tue  contents  of  its  field 


*2.  Insert  a new  node  just  before  the  Kth  node. 

*3.  Delete  the  Kth  node. 

U.  Combine  two  or  more  lists  (linear)  into  a 
single  list. 

3.  Split  a list  into  two  or  more  lists, 
b.  ihahe  a copy  of  a linear  list. 

7.  Determine  the  number  of  nodes  in  a linear  list. 

*8.  Sort  the  nodes  of  a list  into  ascending  order 

*9 . Search  the  list  for  the  occurrence  of  a node 
with  a particular  value  in  some  field. 

•'•'Indicates  operations  receiving  heaviest  use  in 
discrete  simulation. 

fable  1.  Linear  Information  Structure  operations 
(from  Knuth  u3bj) 
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a node  which  is  not  first  or  last  then  a search  operation  must  take 


place.  This  amounts  to  going  through  the  list  one  node  at  a time 
until  the  proper  node  is  found,  based  on  some  search  criteria. 

The  four  operations  just  di scussed--insert , delete,  sort  and 
search--all  are  involved  with  changing  the  content  of  the  list. 

In  discrete  simulation  the  contents  of  many  lists  are  changing 
with  time  as  the  simulation  progresses.  This  activity  is  what 
enables  discrete  simulation  to  represent  dynamic  systems. 

As  mentioned  earlier,  for  efficient  computer  Implementation 
it  is  necessary  to  match  the  storage  structure  and  the  algorithms. 

In  discrete  simulation,  this  has  come  to  mean  that  the  principal 
working  storage  structure  is  the  circular  double  linked  linear 
list.  This  is  so  for  the  general  and  specific  reasons  listed 
below,  paraphrased  from  Knuth  |_3b],  (Note  particularly  the  rela- 
tion of  item  two  to  the  dynamic  nature  of  the  lists  in  discrete 
simulation  mentioned  earlier.) 

1.  Linked  allocation  does  require  more  memory  space 
than  sequential;  however,  in  the  circular  double  linked 
linear  list  (CDLLL)  case  of  discrete  simulation  this 
usually  amounts  to  one  word  per  node,  split  into  two  ad- 
dress fields,  where  the  node  may  contain  typically  up  to 
ten  words.  In  other  words  one  would  incur  a ten  per  cent 
storage  disadvantage  in  order  to  obtain  the  benefits  listed 
be  low. 

2.  Deletion  within  the  CDLLL  is  particularly  fast  be- 
cause the  deleted  node  contains  both  successor  and  predeces- 
sor addresses  within  itself.  Thus  deletion  requires  only 

a few  index  operations.  In  the  sequential  case  several 
nodes  may  have  to  be  moved  to  fill  the  gap  caused  by 
deletion. 

% 

3.  Insertion  is  also  generally  simpler  when  linkage  is  used. 
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There  are  two  additional  operations  which  are  necessary  to 
support  a highly  dynamic  environment.  These  are  the  processes  of 
allocation  and  deallocation,  which  are  Involved  with  the  management 
of  physical  storage  such  as  computer  words.  Allocation  forms 
physical  storage  into  nodes  and  then  makes  the  nodes  available 
for  use.  When  a node  Is  no  longer  needed,  such  as  following  a 
deletion  operation,  the  deallocation  process  makes  the  physical 
storage  used  by  that  node  available  for  future  allocation. 

For  each  operation  mentioned  there  is  an  algorithm  which 
Implements  It.  In  this  case  there  is  a one  to  one  correspondence 
between  operation  and  algorithm.  There  are  in  general  several 
choices  as  to  which  algorithm  should  be  used  to  Implement  a 
particular  operation.  Chapter  III  discusses  the  selection  of 
the  algorithms  considered  In  this  research. 

2.5.  Priority  Queues  and  Time  Flow  Mechanisms 

A priority  queue  Is  an  Information  structure  that  occurs  In 
all  discrete  simulations.  It  is  the  Information  structure  most 
commonly  considered  to  be  the  key  to  Improving  the  computational 
efficiency  of  discrete  simulation  because  of  Its  relationship  to 
TFM's.  For  this  reason  it  Is  the  Information  structure  that  Is  the 
subject  of  this  research.  Knuth  points  out  that  priority  queues 
exist  frequently  In  other  computational  applications  outside  dis- 
crete simulation  |_36],  This  research  applies  to  these  other 
situations  If  the  specific  priority  queues  are  similar. 


To  support  the  preceding  remarks,  the  following  discussion 


will  delineate  priority  queues  and  the  various  roles  they  play 
In  discrete  simulation. 

A priority  queue  is  a combination  of  a waiting  line  and  a 
service  discipline.  It  has  form  (data  structure)  based  on  the 
relationship  of  the  items  (data)  waiting  for  service.  It  has 
function  in  that  the  purpose  of  a priority  queue  is  to  provide 
service  to  the  items  waiting  according  to  the  service  discipline 
(operation). 

There  must  be  some  selection  process  by  which  the  next  item 
is  selected  for  service.  This  is  called  the  priority  or  queue 
discipline.  The  priority  discipline  must  provide  the  rules  for 
making  the  following  two  decisions  (_3  2 ] : 

1.  Which  unit  to  select  for  service  once  the  server 
is  free  to  take  up  the  next  unit, 

2.  Whether  to  continue  or  discontinue  the  service 
of  the  unit  being  serviced. 

There  are  two  special  cases  of  priority  queues,  hereafter 
called  queues.  These  are  the  first  in  first  out  (FIFO)  and  the 
last  in  first  out  (LIFO)  disciplines.  These  queues  will  be 
studied  separately  because  of  their  special  properties  when 
represented  by  storage  structures  and  algorithms. 

Priority  queues  occur  in  discrete  simulation  in  two  major 
ways.  The  first  Is  that  frequently  the  system  under  study  involves 
priority  queues.  Where  such  a case  exists  the  model  description 
function,  which  eventually  must  be  represented  by  storage  structures 
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and  algorithms,  must  imitate  the  priority  queue.  The  second 
occurrence  of  priority  queues  is  In  the  time  flow  mechanism 
function.  Since  the  main  purpose  of  the  TFM  is  to  select  the 
next  potential  happening  (event)  within  the  model  from  a waiting 
line  of  choices  it  is  by  definition  a priority  queue.  This 
selection  process  is  done  by  a primary  key  that  contains  the 
simulation  time  for  the  next  potential  happening.  The  priority 
queue  discipline  is  then  lowest  key  value  (earliest  time). 

Several  researchers  have  pointed  out  that  the  TFM  is  of 
major  concern  in  efficient  computer  implementation  of  discrete 


simulation  and  have  also  gone  on  to  study  various  ways  of  Improving 


the  TFM.  Among  these  researchers  are  Conway  |_8],  Lave  [_44], 
Nance  [52],  Wickham  |_7l],  Vaucher  and  Duval  j_69],  and  Morgan  and 


Siegel  1_50],  Since  this  research  will  address  time  flow  mecha- 


nisms and  in  particular  Morgan  and  Siegel's  work,  additional 
background  is  provided. 

Time  flow  mechanisms  (at  least  since  Conway)  have  been 
classified  in  two  different  types,  known  as  the  variable  time 
increment  (VT1)  and  the  fixed  time  increment  (FT I ) , with  the 
exception  of  Wickham's  work.  Morgan  and  Siegel  suggested  an  adap- 
tive TFM  which  switched  between  the  two  different  types  of  TFM's 
in  accordance  with  a certain  decision  rule.  In  all  cases  except 
Morgan  and  Siegel,  the  intention  has  been  a prl orl  to  select  or 
match  an  efficient  storage  structure  with  the  attendant  algorithms 
for  the  particular  discrete  simulation. 
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Variable  Time  Increment: 

The  variable  time  Increment  (VTI)  time  flow  mechanism  is  a 
procedure  whereby  the  TFM  advances  time  to  the  next  potential 
state  change.  As  Conway  [_ 8 1 and  Morgan  and  Siegel  l_ 5 0 | point 
out,  this  method  Is  less  time-consuming  If  the  state  changes  occur 
relatively  Infrequently  with  respect  to  the  unit  of  time  used  In 
the  simulation.  The  very  nature  of  this  procedure  also  requires 
that  the  future  state  changes  be  schedu lable--that  is,  if  some 
entity  Is  due  to  change  state  at  some  future  time,  that  the  time 
is  known  or  can  be  calculated  without  regard  for  a state  change 
involving  some  other  entity  at^  a 1 1 me  prior  t o the  actua  1 state 
change . As  an  example,  the  arrival  of  a patient  at  the  emergency 
room,  considered  as  an  activity  external  to  the  model  (exogenous 
activity),  can  be  calculated  without  regard  tor  other  happenings 
within  the  model.  Such  an  arrival  would  constitute  a new  item 
in  the  system. 

Fixed  Time  Increment 

Che  fixed  time  increment  (FTI)  time  flow  mechanism  relies  on 
advancing  the  simulation  clock  by  a unit  of  time  and  then  conducting 
a comparison  with  each  of  the  entitles  to  see  if  any  next  potential 
happenings  fall  within  the  last  time  increment.  In  terms  of 
storage  structures  and  algorithms,  the  FTI  approach  is  equivalent 
to  a search  of  a random  (non-ranked)  list  for  all  nodes  that  are 
less  than  or  equal  to  some  time  value. 

Lave  points  out  that  there  is  a time  error  and  possibly  a 
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precedence  error  due  co  Che  face  that  one  does  not  know  exactly 
when  Che  stace  change  took  place,  only  the  interval  within  which 
it  occurred  [44].  (iafarian  and  Ancker  )_ 2 5 J conducted  a study  in 
comparing  the  relative  efficiency  of  the  Vl'I  and  FTI  methods  with 
regard  to  estimating  the  expected  output  of  the  system  and  found 
that  because  of  the  time  error  information  is  always  lost  in  Fi’I 
time  flow  mechanism  simulators,  [’he  main  advantage  to  the  FT  I 
method  is  that  it  is  frequently  faster  than  Vfl  under  conditions 
of  dense  event  changes. 

Dynamic  f i me  Flow  Mechani sm 

Morgan  and  Siegel's  i'FM  switched  between  the  Vfl  and  r f I 
methods  described  above.  This  was  done  based  on  a look  ahead 
procedure  which  measured  the  upcoming  density  of  future  state 
changes.  based  on  a decision  rule  the  fFh  used  the  FI  1 method  tor 
dense  state  changes  and  the  VTI  for  sparse  state  changes.  fne 
purpose  of  this  approach  was  to  increase  the  efficiency  of  the 
ffivi  by  minimizing  the  time  for  the  TFH  algorithm.  A fFM  method 
is  discussed  later  that  incorporates  Morgan  and  biegel's 
philosophy  but  with  a better  efficiency. 
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CC/MPUl'tlR  RFSFARCH  MODELS  AND  TASKS 


3.1.  Introduction 

The  overall  research  methodology  is  shown  in  Figure  U . This 
figure  will  serve  as  the  basis  of  discussions  for  Chapters  III  and 
IV.  i'he  figure  also  serves  as  a means  of  tying  together  the 
concepts  of  this  research  as  discussed  below. 

The  starting  point  for  the  research  methodology  is  the  selec- 
tion of  information  structures  to  be  studied.  The  rationale  for 
the  selection  of  priority  queues  and  their  relationship  to  discrete 
simulation  was  discussed  in  Chapter  II.  In  particular  there  are 
ten  cases  of  priority  queues  considered  in  the  research.  Four  of 
the  cases  fall  under  the  def initio  of  queues  and  the  remaining 
six  are  priority  queues.  in  the  latter  category,  the  first  three 
cases  under  priority  queues  deal  specifically  with  priority  queues 
used  for  time  flow  mechanisms.  The  remaining  cases  deal  with 
■eneral  priority  queues  that  nr>  used  in  discrete  simulation.  All 
these  cases  ari  listed  at  the'  bottom  of  Figure  C, 

An  information  structure  is  on<  or  mem  operations  (algorithms) 
operatin'  on  a data  structure  (storage  structure).  Th<  intersection 
of  a particular  storage  structure  and  a set  ol  algorithms  consti- 
tutes an  inf  i mat  i on  structure,  ow<  ver,  the  intersection  e>r  a 
different  storage  structure  and  a different  set  of  al  ’orithins 
(detailed  instruction  truenture ) can  de-line  the  same  information 
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Flgura  4.  Stalled  Raaaarch  Hathodology 


structure 


In  effect  there  are  different  ways  to  implement  an 


information  structure,  particularly  if  the  architecture  (instruc- 
tion set)  is  also  permitted  to  vary.  To  place  a meaningful  bound 
on  the  scope  of  the  research,  a single  storage  structure  was  chosen 
for  each  architecture  that  seemed  to  be  an  efficient  match  for 
that  architecture,  based  on  some  experiments. 

the  second  input  to  the  research  methodology  is  the  selection 
of  the  architecture  to  be  used  for  comparison.  In  addition  to  the 
random  access  memory  which  was  used  as  a reference,  three  other 
architectures  involving  associate  memories  are  used.  The  architec- 
tures are  shown  along  the  slanting  left-hand  axis  of  Figure  4.  They 
are  the  random  access  memory  (RAM),  the  associative  memory  (AM),  the 
associative  memory  used  in  conjunction  with  a random  access  memory 
(AM/RAM),  and  finally  an  associative  memory  coupled  with  a random 
access  memory  with  a special  associative  search  memory  used  as  an 
auxiliary  memory  to  the  first  two  (AM/KAM/AML) . These  memories  are 
defined  through  a parameterized  instruction  set  (type  of  instruction 
and  instruction  timing)  which  in  turn  leads  to  a parametric  repre- 
sentation of  the  architecture.  Ihesi  two  steps  are  shown  in  the 
dotted  architecture  definition  block  in  Figure  4.  In  addition  the 


respectively  the  circular  double  linked  linear  list  (CDLLL) , the 
single  linked  linear  list  (SLLL),  the  double  linked  linear  list 
(DLLL),  and  the  triple  linked  linear  list  (TLLL). 

Shown  to  the  right  of  the  storage  structure  axis  is  a small 
table  keyed  to  the  specific  architecture  which  lists  keys,  links, 
and  parameters.  In  the  link  column  are  the  specific  names  of  the 
links  which  form  the  storage  structure.  The  keys  complement  the 
links  in  that  where  the  links  identify  a specific  list  and  its 
current  organization  of  the  stored  data,  the  keys  permit  the  speci- 
fic Identification  of  a node  and  hence  permit  the  selection  of 
specific  nodes  or  the  ordering  of  nodes.  The  ordering  Is  then 
the  current  link  structure.  The  links  and  keys  are  covered  more 
fully  when  Table  3 is  discussed  later.  The  parameters,  also 
discussed  later,  permit  the  behavior  of  the  various  storage 
structures  to  be  studied  as  part  of  the  overall  study  of  the 
information  structure.  The  parameters  represent  the  current 
storage  state  of  the  queue  or  priority  queue. 

To  complete  the  discussion  of  the  information  structures  the 
five  algorithms  germane  to  the  ten  information  structure  cases  are 
shown  on  the  vertical  axis.  This  is  a slightly  different  set  of 
algorithms  (operations)  from  what  is  discussed  in  Chapter  II.  This 
is  because  the  sort  activity  has  been  folded  into  the  insert 
algorithm,  since  sort  does  not  apply  for  all  architectures  where 
insert  does.  The  intersection  then  of  the  five  algorithms  with  each 


storage  structure  (and  hence  architecture)  forms  an  information 
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structure  and  in  particular  one  specific  information  structure 
(queue  or  priority  queue)  for  each  of  the  ten  cases.  Altogether, 
the  research  explores  fifty-seven  situations.  The  information 
structures  and  the  architectures  to  be  compared  based  on  them 
have  been  mated  parametrically.  This  completes  the  first  part 
of  the  methodology. 

The  second  part  of  the  research  methodology  is  concerned 
with  the  acquisition  of  comparison  data  on  the  architectures. 

The  first  step  is  the  definition  of  node  cycles.  A node  cycle 
can  be  thought  of  as  a completed  action  involving  a node  in  an 
information  structure.  This  concept  is  based  on  the  intimacy 
between  a node  and  the  data  stored  therein.  In  discrete  simulation 
the  data  and  the  node  usually  have  the  same  lifetime;  therefore 
it  is  possible  to  talk,  about  a node  cycle  Instead  of  a data  cycle. 
The  two  completed  actions  used  for  comparison  in  the  research  are 
the  creation  and  storage  of  data  (node)  within  a storage  structure 
and  the  removal  and  destruction  of  the  data  (node)  at  some  future 
time.  These  two  actions  maintain  the  dynamic  information  struc- 
tures of  this  research.  Because  of  variations  arising  from 
measuring  these  two  actions  separately,  a composite  node  cycle 
was  defined  which  concatenates  the  two  actions  into  what  is  later 
described  as  a birth  and  death  process  for  nodes.  This  approach 
permits  studying  typical  life  cycles  of  nodes  based  on  the  storage 
structure  parameters  mentioned  earlier. 

The  next  step  is  measurement.  The  measure  used  in  this 
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research  is  Che  total  computation  time  tor  the  composite  node 
cycle  of  the  nodes  within  each  situation  studied.  The  second  part  of 
the  research  methodology  permits  the  collection  of  quantitative 
data  on  the  behavior  of  the  information  structures. 

The  third  portion  of  the  research  methodology  is  concerned 
with  Che  formal  comparison  of  the  architectures.  The  problem  here 
is  that  without  certain  assumptions  the  associative  and  random 
access  architectures  do  not  easily  permit  direct  comparison.  The 
principal  alleviating  assumption  to  this  problem  is  that  the  asso- 
ciative memory  has  full  word  operations  just  as  does  the  typical 
random  access  memory.  In  short  this  means  that  a single  word 
compare  in  a random  access  memory  would  take  no  longer  than  a 
multiple  word  compare  in  an  associative  memory.  Current  commercial 
implementations  of  associate  memories  do  not  use  full  word  compares 
partly  because  this  results  in  increased  manufacturing  problems 
and  hence  costs,  and  partly  because  not  all  algorithms  lend  them- 
selves easily  to  such  an  implementation.  One  case  in  point  is  the 
minimum  search  algorithm.  Since  the  minimum  search  forms  such  an 
Integral  part  of  maintaining  priority  queues  in  discrete  simu- 
lation, the  alleviating  assumption  mentioned  above  could  not  be 
used  directly. 

The  alternate  procedure  used  for  comparison  was  to  let  the 
processing  width  (the  number  of  bits  simultaneously  active  in  a 
word)  in  the  associate  memory  vary,  where  algorithms  permitted, 
from  a single  bit  up  to  a full  word.  The  result  of  this  procedure 
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Is  Chat  a trade  off  can  now  be  made  parametrically  between  Che 


processing  width  for  associative  memories  and  the  equivalent  total 
time  necessary  for  the  random  access  memory.  This  idea  is  formal- 
ized in  the  equivalent  breakeven  bit  width  (Eq)  equations  of 
Chapter  IV.  The  results  of  Chapter  IV  then  indicate  under  which 
conditions  In  terms  of  processing  width  the  associate  architec- 
tures excel  for  a particular  composite  node  cycle  representing  a 
particular  information  structure.  The  overall  result  of  this 
methodology  is  twofold.  There  is  first  a detailed  approach  to 
doing  comparisons  where  the  degree  of  aggregation  can  be  controlled, 
since  it  is  possible  to  study  the  detailed  algorithms  intimately 
or  to  throw  away  information  selectively.  fhere  is  also  the 
ability  to  study  the  suitability  of  using  various  forms  of  asso- 
ciative architecture  for  the  implementation  of  representative 
priority  queues  found  in  discrete  dynamic  simulation. 

This  completes  the  formal  research  methodology.  The  remainder 
of  Chapter  III  is  partitioned  into  three  sections.  In  the  first 
section  the  computer  research  models,  Including  architecture  and 
storage  structures  and  the  static  portion  of  the  information 
structures  and  architectures,  are  covered.  In  the  next  section 
research  algorithms  or  the  active  portion  of  the  information 
structures  are  discussed.  This  includes  algorithms,  search  keys 
and  parameters  as  they  apply  to  the  information  structures. 

Finally  there  is  a section  on  measurements.  Figure  4 should  be 
referred  to  often  as  reading  progresses. 
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Computer  Research  Models 


The  purpose  of  studying  the  behavior  of  information  structures 
with  models  is  to  have  a controlled  method  for  delineation  of  the 
architecture  and  a method  of  comparison.  In  particular  the  models 
specified  here  can  be  programmed  with  a particular  workload  and 
therefore  offer  the  opportunity  of  a detailed  analysis  that  may 
not  be  available  when  working  with  a more  abstract  model.  This 
approach  permits  the  consideration  not  only  of  the  specific  task, 
but  also  of  the  overhead  necessary  to  prepare  the  data  for  that 
task.  This  last  point  is  significant  in  that  Flynn  i_23]  points 
out  that  this  overhead  can  significantly  affect  the  relative 
performance  of  parallel  architecture  computers. 

Architecture 

The  first  step  in  setting  up  the  models  is  the  definition  of 
the  architecture.  This  is  equivalent  to  the  definition  of  the 
Instruction  set  (fable  2).  The  instruction  set  is  used  because 
it  is  the  basis  of  implementation  of  the  algorithms  and  therefore 
serves  as  the  basis  for  measurement  since  timings  can  be  con- 
veniently assigned  to  groups  of  instructions. 

To  establish  the  Instruction  set  on  a sound  basis  for  this 
and  future  comparisons,  a well  established  sequential  computer 
model  was  selected  as  the  basis  for  all  the  memory  systems.  This 
model  is  the  pedagogical  computer  MIX,  created  by  Knuth  for  his 
Art  of  Computer  Programming  series  [3b,  37,  38,  39l.  This 
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Instruction  liming 

Inst  rucr i ons 

Remarks 

HIX-KAM  MIX-AM  MIX- 

am/ram 

MIX-AM/ 
RAM/  A ML 

Load 


LDA  ADR, 1(F) 

Add  L suffix 

Cm 

I’M* 

I'm' 

4i’ 

LDX  ADR, 1(F) 

for  AML 

1 M 

Tm* 

I'm’ 

I'm’ 

LDi  ADR, 1(F) 

V 

I'm’ 

ri-i* 

Store 

SCA  ADR,  1(c) 

Add  L suffix 

*-Fi 

IV 

V 

I’m' 

S''X  ADR,  1(F) 

for  AML 

rK 

V 

I'm’ 

I'm' 

Sli  AUK, l(r) 

l JVJ 

V 

I'm' 

Hi' 

STZ  ADR, 1 (f ) 

Store  zero 

H. 

V 

I'm' 

;•  • 

Address  transfer 

i NL_  i-l 

Fi  is  the 

Ml 

iV 

■V 

fA* 

DEJ  M 

value  to  be 

rA 

H’ 

H ’ 

IV 

EN  l_  Fi 
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r 

V 

4' 

Jump 

JA 

V 

Ta' 

‘a  ’ 

JX 

fA 

i 

*-A 

H' 

fa' 

Jl_ 

r , 

I A ’ 

I A ’ 

fA  ' 

J 

H 

H’ 

I 

H' 

J MI  (A , B , or  _) 

Match  Ind 

N/A 

H’ 

» 

4' 

JLi_(A,j3,  or  J) 
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N/A 

H ’ 

H ' 

rA  ' 

wowparc 

Li‘;iJA  I (F  ) 

** 

L 

|w4r  Hi' 

*[w/,  iv* 

|w/r  f rl,M 

Lr.PX  1(F) 

** 

r. 

II 

fi 

II 

LFlfJi  1 ( r ) 

** 

1 ;i 

II 

ft 

II 

CmP/i-iIa  1(F) 

W/A 

w l.v/ 

W I',,’ 

w 

JnPL/ MI  A Hi) 

Lt  w i n ’ s ale 

N/A 

s/.\ 

a/. 

(2.1-1)^  ’ 

.•ii  see  1 1 an i ous 

. J'S. 

(l+.JL, 

d+M)r, 

’ (1M|)1, 

' a*--)!',,'* 

1 t 
l IKJ  / L.  / 

Pit 

i'i/  -V 

Ff  L.' 

Kr«' 

[ w/j  In* 

* w - ^i,k  “ width  of  i^h  field  In  kCh  list. 
* =*  processing  width  in  hits. 

9 » transfer  width  in  bits. 


**  For  the  detailed  algorithms  in  Appendix  B, 

/ indicates  associative  compare  with  prior  reset 
(e.g.  CMPA/EQ  1(F)). 

//  indicates  associative  compare  without  prior  reset 
(e.g.  CMPA//EQ  1(F)). 

table  2.  Instruction  Sets 
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selection  immediately  yielded  the  following  benefits! 

1.  MIX  is  fully  documented  in  terms  of  an  instruc- 
tion set  and  Instruction  timing. 

2.  The  RAM  algorithms  for  manipulating  the  information 
structures  of  interest  in  the  research  are  documented 

on  the  MIX  computer. 

3.  The  MIX  computer  (in  terms  of  its  instruction 
set)  permitted  a reasonably  straightforward  conversion 
from  a sequential  computer  to  an  associative  computer 
with  the  addition  of  or  replacement  by  an  associative 
memory. 

To  explain  this  last  step  and  to  introduce  the  associative 
architecture,  some  background  is  necessary.  two  distinct  memories 
are  considered  in  this  research,  the  random  access  and  the  asso- 
ciative. The  random  access  memory  is  characterized  by  the  fact 
that  it  may  access  any  location  in  memory  randomly  one  location  at 
a time.  An  associative  memory  may  access  all  selected  locations 
at  once  although  not  necessarily  the  full  word  at  each  location. 

In  this  sense  the  associative  memory  is  able  to  associate  (compare) 
some  input  with  all  candidate  stored  words  (as  in  a search)  and 
annotate  all  matching  words.  The  associative  memory  may  not  con- 
sider all  parts  of  all  words  in  a single  memory  activation  or 
interrogation  because  of  the  associated  cost  of  the  hardware 
(_ 30 , 69],  rhe  associative  memory  can  address  in  its  simplest  form 
one  Pit  in  each  selected  word,  usually  the  same  bit.  This  method  of 
implementation  is  sometimes  known  as  the  bit  slice  or  bit  mode  L2lj. 

flynn  [_23]  formalizes  the  preceding  discussion  by  pointing 
out  that  the  sequential  computer  is  a single  instruction  single 


datum  (SISD)  form  of  architecture  while  the  associative  memory 
falls  into  the  class  of  single  instruction  multiple  data  (SIMD) 
architecture.  Therefore  the  only  difference  in  theory  is  that 
the  associative  memory  exhibits  a form  of  data  parallelism  in 
that  it  can  operate  on  several  pieces  of  data  simultaneously. 

In  practice,  conversion  of  MIX  to  an  associative  memory  did  not 
require  a change  of  instructions,  only  a redefinition  with 
respect  to  multiple  data.  (A  general  example  of  associative 
memories  is  given  in  Appendix  A.) 

The  four  memory  organizations  of  Figure  5 may  be  considered 
as  follows.  Figure  5A  represents  the  standard  MIX  computer 
documented  in  Knuth's  books  with  the  random  access  memory. 

Figure  5B  represents  the  MIX  computer  with  an  associative  memory 
substituted  for  the  random  access  memory.  Notice  that  with  the 
exception  of  the  busy  bit,  the  response  store  register,  and  the 
match  indicator,  both  memories  contain  the  same  number  of  words, 

Mj , each  with  the  same  bit  width,  My.  That  means  that  each  memory 
configuration  has  the  same  number  of  data  words  of  the  same 
width  available  for  storage. 

Note  that  this  is  different  from  STARAN  |_63]  which  has  very 
long  word  lengths,  i.  e,,  256  bits.  The  intent  here  is  to  make 
the  memories  as  comparable  as  possible  and  thus  to  isolate  the 
single  datum  from  the  multiple  data  for  measurement  purposes.  In 
performing  the  actual  research  it  is  assumed  that  each  word  holds 
only  one  field  so  that  the  actual  value  of  M«  is  not  Important. 
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The  exception,  noted  later.  Is  Levin's  memory. 

Figure  5C  represents  a combination  of  the  first  two  memory 
configurations  such  that  the  total  storage  remains  the  same.  This 
configuration  was  suggested  by  the  fact  that  in  discrete  simulation 
a node  usually  contains  several  fields,  but  only  one  or  two  of 
these  contain  search  keys  used  for  retrieving  a particular  node, 
while  the  other  fields  merely  contain  information  that  is  used 
after  a node  is  selected.  Since  the  amount  of  storage  used  for 
other  than  search  keys  can  be  significant  in  discrete  simulation, 
and  since  in  general  it  is  more  expensive  to  use  data  in  asso- 
ciative rather  than  random  form  (_ 3 0 , 70J,  why  not  split  the  node 
between  associative  and  random  storage  with  appropriate  linkage? 
This  is  the  essence  of  this  configuration. 

The  last  memory  organization,  shown  in  Figure  51),  was 
included  because  of  the  minimum  search  problem  mentioned  in  the 
introduction.  It  is  based  on  Levin's  algorithm  []l9,  45,  72,  73]. 
The  minimum  search  problem  is  particularly  critical  in  cases  1, 

2 and  3 for  priority  queues.  The  basic  organization  of  this 
fourth  configuration  is  the  same  as  that  for  the  third  except 
that  a parallel  input /out  put  (P10)  channel  has  been  added  between 
the  associative  memory  and  the  associative  memory-Lewln , labelled 
AML.  The  AML  memory  is  ^ words  long  by  KM,^  bits  wide  where  K 
is  some  integer.  Maj  would  typically  be  ten  or  less  words  and  K 
would  be  between  five  and  ten  for  most  discrete  simulation 
applications.  The  essence  of  the  algorithm  is  that  it  guarantees 

38 


Chat  M distinct  records  can  be  Identified  (selected)  in  order, 
ascending  oi  descending,  in  not  more  than  2M-1  memory  interroga- 
tions. The  benefit  of  this  auxiliary  memory  is  that  of  providing 
rapid  ordered  retrieval,  but  it  requires  a more  complex  memory, 
which  is  more  costly.  This  is  offset  to  some  extent  in  that  it 
probably  need  not  be  very  large. 

The  second  part  of  the  memory  definition  is  the  specification 
of  the  Instruction  set.  The  instruction  set  for  all  four  memory 
configurations  is  given  in  Table  2.  The  load,  store,  address 
transfer  and  the  first  four  jump  instructions  operate  in  the  same 
way  for  all  four  memories.  They  are  documented  and  discussed  in 
Knuth's  books  ^36,  37,  3b,  39  and  will  not  be  further  elaborated 
here.  Load  and  store  instructions  pertaining  exclusively  to  the 
AML  memory  are  suffixed  by  L,  otherwise  all  memory  regardless  of 
block  is  considered  contiguous  tn  addressing.  This  set  of  condi- 
tions implies  that  these  associative  memories  may  be  addressed 
either  in  a data  parallel  mode  or  bv  conventional  addressing.  For 
a further  discussion  of  these  ideas  consult  Stone  bo  i,  Shooman 
,bt»,  63  j,  Wolinskv  1U  , Brothcrtcn  . and  Rudolf  o 1 . 

Wolinsky  1U  , Parhami  37  and  ttvde  11  provide  good  backgrounds 
for  associative  memories  and  related  technology. 

The  two  remaining  jump  commands  JMJ_  and  JLI_  are  used  to  test 
the  match  indicators  associated  respectively  with  associative 
memory  and  the  AML  memory. 
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The  major  difference  among  the  memories  occurs  with  the  compare 


instruction.  1'he  normal  compare  instruction  for  MIX  permits  a value 
stored  in  any  register  to  be  compared  with  any  specified  word  in 
memory.  The  associative  counterpart  to  this  is  that  any  value  in 
any  register  may  be  compared  in  a data  parallel  mode  throughout  the 
memorv.  A distinction  is  made  however  by  assigning  a processing 
width  factor  (X)  to  the  associative  memory  so  that  the  total  amount 
of  time  necessary  for  an  associative  memory  to  complete  a compare 
is  a function  of  both  the  number  of  bits  used  to  represent  a search 
or  a compare  field  (w  ) and  the  simultaneous  processing  width  (If), 
fhis  is  indicated  bv  the  [ w/j < symbol  in  the  timing  columns  for 
comparand  compares  where  r~  stands  for  next  highest  integer  value 
of  the  quotient  if  there  is  a remainder.  Gamma  then  represents  the 
number  of  bit  slices  active  simultaneously  for  a given  memory  inter- 
rogation. The  companion  to  gamma  is  delta  (£  ) which  represents 
the  number  of  bit  slices  transferred  simultaneously  from  the  associa- 
tive memory  to  the  special  Lewin's  memory.  In  both  cases  the 

maximum  value  of  either  gamma  or  delta  is  M . 

w 

fhe  general  compare  discussed  above  is  also  referred  to  as  a 
comparand  compare.  In  e’ffect  a reference  value  is  set  up  in  a 
register  and  all  selected  words  are  compared  with  it  according  to 
the  compare-  criterion,  creator  than,  less  than,  et  cetera.  Each 
word  is  compared  indepe>nd<  nt  ly  of  the  others  so  that  gamma  can  be 
meaning:  u 1 ly  introduced,  there  is  anothe  r tvix?  of  c ompare  which 
is  a neencompa  rand  compare.  Hi  i s is  the  search  lor  minimum  eir 
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maximum.  Such  a search  can  not  be  conducted  independently  on  each 


word.  Therefore  simply  increasing  the  processing  width  is  not 
appropriate.  Feng  |_19]  discusses  various  types  of  minimum  and 
maximum  searches  and  points  out  that  to  achieve  the  fastest  ordered 
retrieval  (selecting  members  from  a list  in  order)  Lewin's  algorithm 
is  appropriate.  In  this  research  gamma  will  be  one  in  the 
minimum  search  algorithm  of  Feng  [2l],  or  Lewin's  algorithm  will  be 
used.  This  therefore  bounds  the  two  documented  extremes  for 
"parallel"  minimum  searches  in  terms  of  the  research.  The  minimum 
compare  is  very  important  since  it  is  the  main  associative 
instruction  for  implementation  of  priority  queues. 

The  timings  shown  in  Table  2 are  of  two  types,  TA  and  T^,  which 
represent  the  instruction  time  for  any  non-memory  instruction  in 
the  random  access  and  associative  cases,  respecti vely , and  Tu  and 
TjJj,  which  represent  the  compare  time  for  one  word  in  the  random 
access  case  or  If  bit  slices  in  the  associative  case. 

No  attempt  was  made  to  introduce  exotic  or  special  purpose 
hardware  with  the  exception  of  Lewin's  algorithm,  and  that  was  intro- 
duced as  an  auxiliary  memory.  The  various  memories  were  constrained 
to  be  alike  as  much  as  possible  including  the  Instruction  set. 

Storage  Structures 

In  general  a node  contains  three  types  of  Informations  infor- 
mation which  relates  one  node  to  another,  called  linkage  information 
or  simply  links;  selection  information  (keys)  which  provides  the 
means  to  select  or  identify  a node  (usually  uniquely)  from  other 
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nodes;  and  other  information  appended  to  the  node.  the  last  type 
of  information  is  not  considered  here  because  it  is  not  germane  to 
the  research  other  than  to  serve  as  a reason  for  investigating  the 
AM/KAM  architecture.  Selection  information  is  discussed  under 
a Igori thms. 

Storage  structures  are  usually  classified  in  random  access 
memories  by  the  type  of  linkage  information  contained  in  a node 
(_6,  7,  13,  14,  36  J.  As  explained  earlier  for  linear  lists,  the 
common  types  of  structures  are  sequential,  single,  and  double 
linked.  The  idea  of  classifying  storage  structures  by  linkage  infor 
mat  ion  was  extended  to  associative  architectures  in  Figure  U,  where 
each  architecture  has  associated  with  it  a particular  linkage 
structure.  Shown  to  the  right  in  the  figure  of  the  linkages  are 
the  actual  named  links  for  each  architecture.  As  was  pointed  out 
previously  an  information  structure  may  be  represented  by  several 
alternative  storage  structures.  However  for  this  research  a single 
storage  structure  was  selected  for  each  architecture  that  best 
seemed  to  fit  the  architecture.  Hu  various  storage  structures  will 
now  be  discussed,  this  will  complete  the  model  descri pt ion  made 
up  of  th>  architecture  and  the  storage  structure. 

rhc  primary  storage  structure  for  the  random  access  memory 
in  discrete  simulation  is  the  circular  double  linked  linear  list 
(oDLLL).  fhis  particular  structure  was  discussed  in  Chapter  II. 
i'he  implementation  of  this  structure  is  usually  by  two  links,  known 
as  the  lelt  link  (LLIh.i)  and  the'  right  link  (KLIiNkl.  Although  the* 
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CDLLL  is  usually  used  for  all  lists  in  discrete  simulation  it  is 


possible  and  practical  to  use  a singly  linked  linear  list  (SLLL) 
to  maintain  the  list  of  available  storage.  This  is  a list  made  up 
of  the  empty  nodes  not  currently  in  use  in  the  discrete  simulation. 
Knuth  uses  this  procedure  and  therefore  in  this  particular  case  the 
SLLL  is  used  in  the  research  to  maintain  the  list  of  empty  nodes. 

This  choice  is  also  favorable  to  the  RAM. 

In  the  pure  associative  memory  it  is  also  adequate  to  use  only 
a SLLL.  The  particular  single  link  used  in  the  associative  memory 
storage  structure  is  the  list  name  (LN)  link.  Therefore  each  list 
needed  for  a discrete  simulation  stored  in  an  associative  memory  has 
one  link  which  uniquely  identifies  all  nodes  belonging  to  that  list. 
To  conduct  a particular  operation  on  a particular  list  it  is  neces- 
sary only  to  preface  that  operation  with  an  equal  search  based  on  LN 
which  in  turn  will  annotate  all  current  nodes  belonging  to  that  list. 

The  associative  memory  used  in  conjunction  with  a random 
access  memory  uses  two  links  and  is  therefore  classified  as  being 
double  linked.  The  first  link,  LN,  is  used  in  the  same  way  as  in 
the  associative  only  case,  and  a second  link  is  added  to  permit 
splitting  the  node  into  two  parts.  The  second  link  is  the  RAM  node 
address  link  (RNA)  and  as  its  name  implies  it  is  the  address  of  a 
node  in  random  access  memory  that  contains  appended  information  to 
the  node  stored  in  the  associative  memory.  The  introduction  of  the 
second  link  permits  the  number  of  words  that  are  needed  in  the  asso- 
ciative memory  to  be  reduced  to  just  the  number  needed  to  locate 
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a node  uniquely.  It  should  tie  pointed  out  that  the  Reii  field  could 
just  as  easily  be  the  address  of  a secondary  storage  location  such 
as  a disc.  Since  the  Ki'IA  points  to  a node  in  RAM  it  resides  physl- 
cal'y  in  the  associative  memory. 

Hie  storage  structure  for  the  fourth  architecture  is  triply 
linked.  As  in  the  previous  case,  LN  and  KNA  are  used  in  the  same 
wav.  The  third  link  is  the  associative  memory  address  (A:iA).  I he 
Ai-iA  link  is  a self  link  that  is  transferred  to  the  AML  memorv  when 
the  node  is  transferred  and  points  back  to  the  associative  memory 
node  location.  This  permits  a node  selected  in  the  Ai-.L  to  be  first 
referenced  back  to  the  associative  mi  norv  via  the  A ,-iA  link  and  then 
back  to  the  appended  information  in  the  KAN  via  the  KAA  link.  .his 
saves  time  and  storage  bv  avoiding  the  transfer  of  appended  informa- 
tion into  the  AKL.  This  triply  linked  structure  assumes  that  Levin’s 
memory  is  used  as  an  auxiliary  memory  to  the  AM/KAM.  In  certain 
cases  Lewin’s  memory  is  used  directly  without  being  activated  by 
a prior  transfer  from  the  n.  portion  of  the  Ai./RAh  memorv.  in  such 
cases,  where  the  new  node  is  placed  directly  within  the  Levin’s 
memory,  the  appended  information  is  placed  within  the  HA.-.  and  a RA,: 
node  address  - Lcwin  (RikAlJ  link  is  substituted  f e>r  tin  RNA  link. 

In  such  a case  the  A A link  is  not  nee  led,  reducing  that  particular 
noele  to  two  links.  It  is  anticipated  that  sucti  a condition  would 
represent  a small  portion  of  the  total  node  usage  in  discrete  simula- 
( i on . 


'n  aP  cases  the  storage  structures  art  linear  lists  in  the 


sense  chat  Che  links  only  refer  to  Information  within  the  same 
list.  There  are  no  links  stored  between  information  in  different 
lists.  An  additional  assumption  is  also  made  about  the  storage 
structures.  That  is,  that  a uniform  node  size  is  used  throughout 
a particular  simulation.  This  means  that  all  nodes  in  all  lists 
use  the  same  number  of  words.  As  Knuth  implies,  this  greatly 
simplifies  the  Implementation  of  allocation  and  deallocation 
algorithms  for  the  RAM.  This  assumption  does  not  seem  to  represent 
a practical  limitation  in  discrete  simulation  with  regard  to 
wasted  storage  since  it  Is  frequently  the  case  that  the  number  of 
words  needed  per  node  in  various  lists  is  within  a few  words  of 
each  other.  Additionally  the  comparison  of  nonuniform  node  sizes 
within  the  context  of  this  research  is  a significant  undertaking 
and  it  is  suggested  in  Chapter  V that  this  be  considered  as  a 
follow-on  research  topic. 

3.3.  Research  Algorithms 

The  purpose  of  this  section  is  to  describe  those  computational 
activities  which  will  overlay  the  models.  The  computational 
activities  or  research  algorithms  will  then  form  the  model 
driving  function.  The  response  of  the  model  to  this  workload 
will  be  measured  in  the  manner  described  in  the  next  section. 

The  algorithms  are  shown  in  Figure  4.  These  algorithms,  for 
reasons  described  later,  are  then  merged  into  node  cycles 
and  then  further  to  a composite  node  cycle.  It  is  this 
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last  entity,  the  composite  node  cycle,  that  serves  as  a formal 
model  task  to  be  measured. 

Algorithms  in  t.enera  1 

1'his  section  is  not  intended  to  discuss  algorithms  in  detail, 
since  that  is  done  in  Appendix  li.  Rather  the  purpose  here  is  to 
describe  and  classify  the  types  of  algorithms  used  in  discrete 
simulation.  As  discussed  earlier,  discrete  simulation  as  considered 
in  this  research  is  a dynamic  activity.  As  such  it  requires  the 
creation,  storage,  retrieval  and  eventual  use  of  data.  After  data 
is  used  it  is  generally  destroyed.  Since  the  data  is  stored  in 
nodes  in  such  a fashion  that  all  the  data  within  the  node  are 
usually  used  at  one  time,  the  behavior  of  data  within  discrete 
simulation  may  be  likened  to  a stochastic  birth  and  death  process, 
where  the  nodes  are  spawned  by  some  stochastic  mechanism  (born),^ 
exist  for  some  period  of  time  (live,)  and  then  die.  fo  carry  this 
idea  somewhat  further,  the  three  portions  of  the  life  cycle  will 
be  used  in  an  analogous  fashion  to  outline  the  necessary  algorithms. 

Ihc  birth  process  involves  the  creation  of  the  data,  the  allo- 
cation of  the  node  which  is  to  receive  the  data  from  a list  of 
available  nodes,  th<  placement  of  the  data  into  the  node,  and 
finally  the  insert  ion  of  the  noilr  into  some  sort  of  storage  struc- 
ture. the  birth  process,  then,  can  he  defined  by  four  tvpes  of 
algorithms!  creation,  allocation,  placement  and  insertion.  Uf 
these  , creation  is  usually  a numetric  proce’ss  and  is  therefore  not 
the  subject  of  this  research,  of  the  remaining  three-,  placement 

Kne  can  not  alwavs  guarantee  formally  epat  the  node 
process  is  in  tact  a continuous  parameter  narkov  chain  with  homo- 
geneous, transition  intensities  id], 
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has  been  divided  between  allocation  and  insertion  since  the  different 
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memory  organizations  require  different  contents  within  the  nodes  to 
support  the  particular  information  structure.  Therefore  to  account 
for  these  differences  a family  of  allocation  and  insertion^  algo- 
rithms have  been  set  up,  each  appropriate  to  the  information  struc- 
ture and  memory  organization,  so  that  the  effect  of  the  additional 
data  required  within  the  node  can  be  measured.  The  additional  data 
is  defined  as  that  data  which  is  not  common  to  all  memory  organiza- 
tions for  a particular  information  structure.  For  instance,  where 
the  random  access  memory  by  itself  uses  links  and  the  associative 
memory  does  not,  there  are  different  algorithms  for  allocation  and 
Insertion.  At  the  outset  of  the  research  a separate  algorithm  for 
each  memory  organization  and  storage  structure  was  provided.  How- 
ever, because  of  some  commonality,  this  was  reduced  to  a subset  of 
the  original  algorithm  list. 

The  life  process  is  merely  the  storage  over  time  of  a node 
within  a storage  structure.  Although  there  are  no  algorithms 
directly  concerned  with  this  phase,  the  fact  that  nodes  are  present 
within  a storage  structure  has  a bearing  on  the  measurable  behavior 
of  the  algorithms  associated  with  birth  and  death. 

The  eventual  utilization  and  then  destruction  of  the  information 
are  grouped  under  the  death  of  the  node  because  they  are  adjacent 
in  time.  The  death  of  the  node  begins  when  it  is  selected  by  some 
retrieval  process.  Retrieval  deletes  the  node  from  the  storage 
structure.  This  is  followed  by  the  removal  of  the  information  from 

2lnsertion  includes  sorting  or  ranking  the  storage 
structure  for  priority  queues  for  the  RAM  architecture. 
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Che  node  and  Chen  Che  deallocaclon  of  Che  node  or  recum  Co  a pool 
of  available  nodes.  The  deach  process  can  be  defined  Chen  by  four 
algorithms:  delecion  (recrleval),  removal  of  informaclon,  usage 

of  informaclon  and  deallocaclon.  Of  Chese,  usage  Is  usually  a 
numeric  process  and  is  noc  Che  subjecc  of  Chis  research.  Removal, 
like  placemenc,  is  divided  between  delecion  and  dea 1 locac i on , again 
for  Che  same  reasons.  rhis  means  ChaC  deach  can  be  defined  by  Che 
Cwo  algoriChms  of  delecion  and  deallocaclon.  The  excepcion  Co  Chis 
occurs  when  considering  cases  four,  five  and  six  under  prlorlcy 
queues.  Here  chere  is  Che  addiclonal  seep  of  search.  This  is  so 
because  delecion  in  Che  RAM  archiceccure  normally  assumes  ChaC  Chere 
is  a real  or  implied  order  Co  Che  scorage  scrucCure  from  which  Che 
node  is  deleCed.  This  is  noc  crue  in  cases  four,  five  and  six,  so 
Che  delecion  is  prefaced  by  a search  algorichm. 

Search  Keys  - Genera  1 

In  terms  of  specific  implemencac ion,  certain  keys  are  neces- 
sary. These  are  listed  in  Table  3 and  discussed  below.  The  keys 
are  discussed  first  along  with  a description  of  che  various  priority 
queues  considered  in  Che  research.  This  discussion  will  center 
around  Table  3. 

Search  Keys  - (jueues 

Cases  one  and  Cwo  in  fable  3 are  for  Che  FIFO  discipline, 
while  cases  Chree  and  four  are  for  LIFO.  As  indicated  in  Che  table, 
Chere  are  no  keys  for  che  RAM  archiceccure,  and  therefore  cases  one, 
Cwo,  chree  and  four  are  che  same.  This  is  because  in  the  RAM 
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storage  structure  queues  are  maintained  by  physical  order,  hence 


searches  and  thus  search  keys  are  unnecessary.  The  insertion  algo- 
rithm simply  places  the  next  node  first  in  the  list  and  the  deletion 
algorithm  removes  the  appropriate  node  in  the  llst--the  one  with  the 
longest  (FIFO)  or  shortest  (LIFO)  waiting  time  in  the  list.  In  the 
three  associative  architectures  the  lists  are  maintained  in  a random 
order  and  therefore  searches  are  necessary.  This  means  that  a pri- 
mary key  upon  which  to  search  must  be  provided.  Two  different 
primary  keys  are  studied  under  the  FIFO  discipline  and  two  under  the 
LIFO  discipline.  The  first  key  is  a time  of  entry  (l'OE)  key  (cases 
one  and  three)  which  reflects  the  simulation  clock  at  the  time  the 
entry  is  made.  The  second  is  a counter  (CTK)  key  (cases  two  and 
four).  An  index  register  is  incremented  and  decremented  by  one  for 
each  insertion  or  deletion.  The  value  of  the  index  reglster(s)  is 
then  compared  against  the  CTR  key.  It  is  possible  to  use  a list  of 
contiguous  numbers  for  queues  with  FIFO  or  LIFO  disciplines  because 
insertion  and  deletion  occur  in  a regular  manner.  In  the  FIFO  case 
two  index  registers  are  used,  whereas  for  LIFO  only  one  is  needed. 

Search  Keys  - Priority  Queues 

The  prime  interest  in  studying  priority  queues  for  discrete 
simulation  is  that  they  are  the  Information  structure  used  to  imple- 
ment time  flow  mechanisms.  Cases  one,  two  and  three  under  priority 
queues  along  with  their  individual  subcases  are  introduced  specifi- 
cally for  this  purpose.  In  the  random  access  architecture  it  is 
assumed  that  the  priority  queue  is  based  on  a rank  ordering  of  the 
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nodes  on  Che  primary  key.  Priority  queues  also  come  up  in  Che  con- 
text of  general  searches.  In  this  role  priority  queues  are  used  to 
find  the  first  node  that  meets  some  specified  key  value,  to  find  all 
the  nodes  that  meet  a specified  key  value,  or  to  find  the  minimum 
or  maximum  node  based  on  a specified  key.  In  this  situation  it  is 
assumed  that  either  the  list  is  unranked  or  ranked  on  some  key  other 
than  the  one  currently  part lc i pat lng  in  the  search.  These  latter 
three  cases  are  covered  in  priority  queue  cases  four,  five  and  six. 

Search  Keys  - Pri  ortty  Queues . Cases  J_,  2,  3^ 

The  implementation  of  the  TFM  priority  queue  is  somewhat  more 
complex  than  implementing  other  types  of  priority  queues,  first 
because  there  is  frequently  more  than  one  key  to  deal  with,  and 
second  because  of  the  manner  in  which  new  nodes  arrive  in  the  list. 
In  the  latter  area,  new  nodes  represent  new  future  potential  state 
changes  within  the  simulation  model.  As  such  their  primary  key  is 
a future  simulation  time.  Since  simulations  do  not  back  up,  there 
is  a guarantee  that  the  value  of  the  primary  key  is  always  greater 
than  or  equal  to  the  present  simulation  time.  it  is  also  usual 
that  the  value  of  the  primary  key  for  new  nodes  entering  the  TFM 
list  places  them  near  the  bottom  of  the  ranked  list,  and  there  are 
usually  several  nodes  at  the  top  of  the  list  which  always  have 
key  values  less  than  the  new  node  key  value.  This  creates  a 
situation  where  the  top  part  of  the  list  can  be  dealt  with  at 
various  times  independently  of  the  remainder  for  the  purpose  of 
retrieval,  since  the  node  relationships  do  not  change  with  new 
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arrivals 


It  is  for  this  reason  that  discrete  simulation  priority 


queues  are  maintained  within  the  random  access  memory  by  a sort-in 
from  the  bottom  of  the  list. 

To  study  the  effects  of  dealing  exclusively  with  the  TFM 
priority  queue,  three  cases  are  used.  The  a,  b,  c subcases  of 
these  three  cases  will  be  discussed  later.  Cases  one,  two  and 
three  differ  only  by  the  number  of  keys.  They  represent  respec- 
tively the  situations  where  a single  primary  key  (PK)  is  used  for 
retrieval,  where  a primary  key  is  used  with  a secondary  key  (SK) 
to  establish  priority  among  equals,  and  finally  (for  the  asso- 
ciative architecture)  where  a third  key  is  added  for  the  default 
ranking  that  is  built  into  the  RAM  algorithms.  The  third  key 
is  not  needed  in  the  RAM  because  of  the  nature  of  the  insertion 
algorithm  working  in  conjunction  with  the  RAM  storage  structure. 
The  insertion  algorithm  involves  the  two-step  process  of  first 
sorting  the  node  into  the  proper  position  within  the  list  and  then 
arranging  the  proper  linkage.  By  virtue  of  the  sort-in  step,  ties 
in  key  values  are  automatically  broken  by  a FIFO  ranking.  This 
is  sometimes  referred  to  as  default  ranking  or  stable  sorting. 

Ihis  is  not  the  case  within  the  associative  architecture  where  an 
extra  key  must  be  used  to  break  ties.  This  extra  key  is  used  in 
case  three. 

The  primary  key  (PK)  is  the  main  value  used  to  operate  the 
priority  queues.  in  the  case  of  time  flow  mechanisms  it  would  be 
the  simulation  time.  In  some  cases  this  is  not  adequate  and  a 
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priority  key  is  used  in  conjunction  with  and  acts  as  a refinement 


of  the  primary  key.  This  is  indicated  here  as  the  secondary  key 
(SK).  Finally  a tertiary  key  (TK)  is  used  in  the  associative 
cases  to  implement  the  stable  sort.  This  last  key  acts  as  another 
refinement  of  the  primary  key. 

These  are  the  three  principal  search  techniques  used  in  con- 
junction with  cases  one,  two  and  three  for  the  purpose  of  studying 
priority  queues  and  time  flow  mechanisms.  The  first  is  the  standard 
technique  of  searching  for  the  minimum  value  based  on  a sort  main- 
tained list.  This  technique  applies  to  the  RAM.  The  second 
technique  is  Feng’s  Y - 1 minimum  associative  search  referred  to 
previously.  This  is  used  for  the  AM  and  AM/RAM  architectures. 

The  third  technique  is  based  on  Lewin’s  algorithm  and  applies  to 
the  AM/RAM/AML  architecture.  The  first  two  techniques  are  similar 
and  are  used  to  implement  the  variable  time  Increment  TFM  either 
on  the  RAM,  the  AM  or  the  AM/RAM. 

The  third  technique,  based  on  some  experimentation  and  on 
the  work  of  Morgan  and  Siegel  [_50l,  i s an  algorithm  worked  out  to 
use  Lewin’s  memory  and  algorithm  effectively.  This  algorithm 
is  called  the  fixed  increment  minimum  value  (FIMV)  TFM.  Lewin’s 
algorithm  is  a subset  of  the  FIMV  time  flow  mechanism.  The  FIMV 
algorithm  combines  the  fixed  time  increment  and  variable  time 
increment  techniques  to  yield  a TFM  that  returns  nodes  in  order, 
as  in  the  VTI  TFM,  but  has  the  speed  of  the  FTI  TFM  without  the 
errors  mentioned  earlier.  The  FIMV  technique  works  in  the  fol- 
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lowing  manner 


A time  increment  at  is  established  at  the  outset 


r 


of  the  simulation.  It  is  selected  so  that  there  is  a high  proba- 
bility that  there  will  be  at  least  ont*  potential  state  change 
occurring  within  at.  this  is  the  approach  opposite  to  the  normal 
way  a time  increment  is  selected  for  the  FTI  L’FM  j_l6  . The 
simulatioi  lock  register  is  incremented  by  the  amount  at  and  a 
less  that  qual  search  is  conducted.  This  is  a parallel  search 

as  opposed  to  the  minimim  value  search.  If  the  search  returns  no 
nodes,  the  clock  is  again  incremented.  If  one  node  is  returned, 
the  clock  is  se^  to  the  event  time,  that  node  is  processed,  and 
the  clock  is  again  incremented.  Regardless  of  whether  there  are 
one  (case  one),  two  (cast  two)  or  three  (case  three)  keys  involved 
for  the  priority  queue,  the  less  than  or  equal  search  is  only 
conducted  on  the  primary  key  tor  the  fixed  increment  part,  if 
the  search  returns  mort  than  one  node,  then  a transfer  occurs  of 
all  returned  nodes  to  the  AML.  ihis  transfer  includes  all  keys 
(ont  , two  or  three)  and  anv  links  that  are  necessary.  ihc  keys 
art  then  stored  in  tilt  A.-iL  horizontally  from  left  to  right  (highest 
order  to  lowest  order  bit)  instead  of  vertically  by  word.  ihis  is 
because  [ewin’s  algorithm  operates  on  all  bits  simultaneously, 
i.ewin’s  algorithm  is  then  initiated  to  return  the  nodes  in  order. 

In  this  way  the  t ixed  increment  portion  is  a quick  vav  to  select 
the  top  independent  nodes  of  the  list,  and  lowin' s algorithm  is  a 
quick  way  to  order  just  those  nodes  selected  by  the  fixed  inert -me  nt 
and  not  all  the  nodes. 
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There  are  four  outcomes  for  the  fixed  increment  portion  of 
the  FIMV  TFM  algorithm.  These  are  the  return  of  a single  node, 
the  return  of  M identical  nodes,  the  return  of  M dissimilar  nodes, 
and  the  return  of  M nodes,  some  of  which  are  identical.  In  this 
research  the  first  three  outcomes  are  considered  for  the  purpose 
of  collecting  data.  A problem  arises  in  comparing  the  three 
selected  outcomes  for  the  FIMV  TFM  algorithm  based  on  the  AM/RAM/AMT 
architecture  with  the  algorithms  used  for  the  RAM,  AM,  and  AM/RAM 
architectures.  For  this  reason  three  subcases  for  each  of  the 
priority  queues  are  introduced.  These  are  subcases  a,  b and  c 
respect i ve ly . 

In  subcase  a the  single  node  is  the  specified  outcome.  This 
corresponds  to  the  development  specified  so  far  for  a composite 
node  cycle.  That  is,  the  measurement  (discussed  in  the  next 
section)  is  applied  to  a single  composite  node  cycle  whose  output 
is  the  birth  and  death  of  a single  node  under  pa rameter i zed 
conditions  for  PK  (case  one),  PK/SK  (case  two),  and  PK/SK/TK  (case 
three). 

In  subcase  b the  multiple  Identical  nodes  are  the  specified 
outcome.  This  corresponds  in  the  RAM  architecture  to  repeated 
trials  of  the  composite  node  cycle.  In  the  AM  and  AM/RAM 
architectures  this  corresponds  to  repeated  trials  of  the  composite 
node  cycle  with  the  exception  that,  only  a single  search  is  required 
to  support  all  M composite  node  cycles.  This  comes  about  because 
the  match  indicator  in  the  associative  memory  indicates  ties. 
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The  ties  are  based  on  Pk  alone  (case  one),  Pk  and  Sk  (case  two) 


and  Pk,  Sk,  l'k  (case  three).  Subcase  b or  c is  determined  in  the 
MV  (minimum  value)  portion  of  the  algorithm  after  transfer.  For 
subcase  b,  Lewin's  algorithm  has  the  inherent  ability  to  determine 
if  all  the  entries  are  identical  at  the  outset.  Therefore  in 
subcase  b the  FIMV  algorithm  immediately  loads  the  first  responder 
(lowest  numbered  memory  location)  into  the  appropriate  register(s) 
and  transfers  control  to  the  simulation  control  code  (SCC).  The 
SCC  acts  as  the  discrete  simulation  sequencer.  Nodes  are  then 
removed  in  order  as  part  of  the  composite  node  cycle. 

Subcase  c,  M dissimilar  nodes,  is  the  one  commonly  occurring 
in  discrete  simulation.  tor  the  RAM,  AM  and  AM/ RAM,  M repeated 
trials  are  required  for  the  composite  node  cycles  without  any 
special  savings  realized  by  AM  or  AM/RAM  architectures.  In  the 
case  of  the  AM/RAM/AML  architecture,  a savings  is  realized  in  the 
MV  portion  of  the  FIMV  algorithm,  because  Lewin's  algorithm  takes 
on  the  average  two  compare  cycles  per  node.  This  is  significantly 
different  from  the  / “1  minimum  search  where  computation  time 
rows  linearly  with  field  width.  After  the  outcome  of  the  FIMV 
algorithm,  ontrol  is  a:ain  transferred  to  the  SCC  program  which 
r<  qj  sf  nodes  individually  as  part  of  the  composite  node  cycle. 

in  ill  iccom  s of  the  FIMV  algorithm,  casts  or  subcases,  the 
similar  ion  <.  lock  is  always  set  to  the  next  event  time.  This  is  accom- 
plished automatically  in  the  SCC  after  node  retrieval.  This  prevents 
the  time  disparity  evident  in  the  FTI  algorithm.  Therefore  it  is 
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always  applied  to  the  correct  time. 

In  Chapter  IV  the  research  results  for  cases  one,  two  and  three 
under  priority  queues  are  grouped  according  to  subcase--that  is, 
sub  ase  a is  presented  first  for  each  of  the  separate  cases,  then 
subcase  b,  and  finally  subcase  c.  In  this  manner  the  effect  of  the 
specified  outcomes  on  each  of  the  architectures  can  be  presented 
and  compared.  Other  situations  that  can  occur  in  the  AM/KAM/AML 
architecture  are  discussed  in  Appendix  B. 

Search  Keys  - Hr i or  1 1 v ljueues , Cases  A,  5^,  6 

Cases  four,  five  and  six  represent  special  cases  of  priority 
queues  found  in  discrete  simulation.  Case  four  is  find  the  first 
node  in  a list  that  meets  the  search  criterion  (equivalent  to  an 
rTl  TFM),  case  five  is  find  all  nodes  that  meet  the  search  cri- 
terion, and  case  six  is  find  the  minimum  or  maximum  node.  these 
three  cases  assume  that  the  list  is  maintained  in  a random  fashion 
in  the  RAM  instead  of  in  physical  order  for  queues  or  ranked  order 
for  the  first  three  cases  of  priority  queues.  This  situation  of 
having  to  search  unordered  lists  comes  about  in  discrete  simula- 
tion when  it  is  necessarv  to  search  a list  that  is  ordered  on  a 
key  other  than  the  one  which  is  to  participate  in  a search,  and  it 
is  either  not  felt  to  be  worthwhile  to  maintain  the  list  with  two 
or  more  dissimilar  keys  or  the  ability  to  do  so  is  not  provided  in 
the  simulation  language  or  package.  These  last  three  cases  there- 
fore require  the  extra  search  algorithm.  All  implementations  are 
based  on  a single  key  search. 
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Parameters 

1'he  parameters  necessary  to  characterize  the  various  cases  are 
shown  in  Figure  4.  The  first  group  of  parameters,  a,  b,  c,  d,  e 
and  lj  pertain  to  RAM  implementations.  The  first  parameter,  a,  is 
the  expected  number  of  nodes  that  must  be  sorted  through  before  the 
proper  location  of  the  node  to  be  inserted  can  be  found.  This 
assumes  no  ties.  The  next  parameter  is  the  expected  number  of  ties 
that  must  be  resolved  before  the  node  to  be  inserted  can  be  properly 
located.  Parameter  c represents  the  expected  list  depth  in  nodes 
before  the  first  success.  it  pertains  to  case  four  under  priority 
queues.  Parameter  d is  the  expected  number  of  successful  nodes  in 
case  five  under  priority  queues.  Parameter  e represents  the  expected 
number  of  interchanges  for  selecting  the  maximum  or  minimum  (case  b) 
and  1^  represents  the  list  length  in  nodes. 

In  the  associative  cases,  the  only  parameter  is  fj  ^ , which 
represents  the  width  in  bits  of  the  1<C^  field  in  the  i^  composite 
node  cycle  (discussed  below).  In  short,  list  length  and  node  posi- 
tion are  unimportant  in  the  associative  architectures  used  in  this 
research . 

Node  Cyc 1 e 

To  study  the  behavior  of  the  nodes  under  the  conditions  de- 
scribed in  the  previous  section  on  algorithms  again  suggests  a com- 
parison to  the  birth  and  death  process.  To  do  this,  the  concept  of 
a node  cycle  is  introduced.  The  various  steps  in  the  birth  and 
death  process  and  node  cycles  are  illustrated  in  Figure  h.  At  the 
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Figure  6.  Discrete  Simulation  Node  Cycles 


¥ 


center  of  the  graph  is  node  management,  which  manages  the  nodes  by 


way  of  using  the  algorithms  previously  discussed.  Requests  for  non- 
numeric  processing  can  come  either  from  the  use/create  cycle,  which 
is  typically  part  of  the  model  description  function,  or  directly 
through  the  time  flow  mechanism.  A node  cycle  is  then  defined  as 
the  completion  of  a non-numeric  path  starting  with  and  terminating 
on  node  management.  Examples  are  allocate-placement-insert  (birth) 

or  search-de lete- remova 1 -dea 1 1 ocat e (death).  The  removal  step  ! 

provides  the  node  information  to  the  use  step. 

Composite  Node  Cvc le 

The  composite  node  cycle  is  made  up  by  concatenating  the  birth 
and  death  cycles.  In  other  words  the  total  lifetime  of  the  node  in 
the  information  structure  is  measured  for  each  memory  organization 
and  each  information  structure.  This  was  done  not  only  because  of 
the  many  alternatives  possible  but  because  there  are  steps  (algo- 
rithms) in  the  node  cycle  where  one  of  the  other  memory  organiza- 
tions shows  an  advantage.  Therefore  a comparison  at  the  cycle  or 
step  basis  may  be  misleading  in  terms  of  overall  performance.  This 
may  appear  as  an  aggregate  approach,  which  it  partially  is,  but  the 
research  is  so  laid  out  chat  the  individual  steps  causing  poorer 
performance  of  one  or  the  other  memory  organizations  can  be  indivi- 
dually investigated  as  part  of  the  extension  to  the  research.  The 
composite  node  cycles  are  shown  in  Appendix  ii  in  Tables  P-1,  B-2 
and  P-3  for  each  memor-  organization. 
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3.4.  Measurement s 


The  criterion  used  tor  comparison  in  this  research  is  total 
computer  model  computation  time  for  a composite  node  cycle.  As  will 
be  seen  in  Chapter  IV,  this  results  in  a parametric  equation.  This 
approach  is  in  keeping  with  the  approach  used  by  Knuth  to  determine 
the  usefulness  of  various  algorithms.  Alternative  measures  such  as 
the  number  of  compares,  the  number  of  memory  instructions  or  the 
number  of  memory  Interrogations  (without  qualification)  were  consid- 
ered and  discarded  as  inappropriate  for  any  of  the  following  reasons! 

1.  Not  all  the  algorithms  encountered  within  a node  cycle 
used  compares. 

2.  During  a node  cycle  in  one  memory  certain  algorithms 
would  not  use  compares,  but  another  memory  would.  The  simple 
queues  are  an  example. 

3.  During  the  node  cycle,  in  some  cases,  a percentage  of 
the  work  involved  auxiliary  and  not  memory  instructions. 

4.  Memory  interrogations  (without  qualification)  are 
Insensitive  as  a function  of  field  width  to  the  total  amount 
of  time  needed  by  certain  memories  to  complete  a certain 
process. 

5.  Measures  other  than  total  time  are  insensitive  to  some 
or  all  of  the  f ol 1 owing--the  processing  rates,  processing 
widths  (number  of  bits  simultaneously  active)  and  the  transfer 
width  (number  of  bits  transferred  simultaneously  per  unit 

of  time). 

6.  Overhead  is  not  considered, 

7.  Relative  memory  speeds  are  not  differentiated, 

['he  selection  of  total  time  then  represents  a superset  of  the 
previously  considered  measures.  As  such  any  of  the  other  measures 
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The  use  of  total  time  also  permits  a consideration  of  the  relative 


speeds  of  the  various  memories  through  their  instruction  set  tim- 
ing. i'his  allows  the  relative  speed  to  be  introduced  as  a para- 
meter. The  importance  of  this  has  been  brought  out  by  Thurber  and 
berg  j_6  8^)  and  also  by  Weinberger  (_  7 0 ] since  total  cost  is  tied 
closely  to  memory  speed.  In  other  words,  if  the  results  indicate 
that  the  associative  memories  must  be  much  faster  than  the  random 
access  memories  to  achieve  an  overall  processing  advantage,  there  Is 
reduced  interest  in  studying  them  until  tec.  nology  reduces  their 
cost  L30  I. 
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CHAPfER  IV 


RESEARCH  RESULTS 


4.1.  Introduction 

The  method  used  to  portray  the  results  of  comparing  the 
various  architectures  is  based  on  the  dichotomy  that  total  model 
computation  time  for  a composite  node  cycle  for  the  RAM  Is  based 
on  the  presence  of  other  nodes  within  a list,  while  for  the 
associative  cases  it  is  based  on  the  bit  width  of  keys  and  links. 

To  exploit  this  dichotomy  the  concept  of  a breakeven  bit  width  E 

q 

is  introduced.  This  is  the  value  in  terms  of  associative  memory 
compares  that  is  allocated  to  the  combined  search  key  and  link 
fields  used  by  the  associative  memory  within  a particular  com- 
posite node  cycle.  To  generate  the  E equations  the  total  time 

q 

taken  by  the  RAM  architecture  is  set  equal  to  the  total  time  (IT.) 
taken  by  the  AM,  AM/RAM  or  AM/RAM/AML  architecture  for  the  1th 
composite  node  cycle.  This  concept  can  be  formulated  in  terms  of 
Equations  4-la,  4-lb  and  4-lc  below. 

MIX-RAM  vs.  MIX-AM  rTi^RAM)  " TT1^PAM)  4-la 

MIX-RAM  vs.  MIX-AM/RAM  ^l^RAM*  “ TTl  ( PAN/RAM)  6_lb 

MIX-RAM  vs.  MIX-AM/ RAM/AML  ^(P^)  - TTj  ( PAM/RaM/AMl)  6*lc 

PRAM*  PAM*  PAM/ RAM  and  PAM/RAM/AML  are  respectively  the  para- 
meters appropriate  to  each  architecture.  Equation  4-la,  4-lb  or 
4-lc  is  a shorthand  form  of  Equation  4-2.  On  the  left  hand  side  of 
Equation  4-2,  and  Y^  represent  the  RAM  parameters,  P^M,  as 
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coefficients  of  the  two  RAM  computation  times.  The  right  hand  side 


of  Equation  4-2  is  made  up  of  two  parts.  The  first,  which  is  the 
linear  summation  of  all  associative  compare  times,  will  become  Eq. 
The  second  part  consists  of  the  and  Y^  coefficients  of  the  two 

associative  computation  times.  They  represent  the  PAM,  or 

*AM/ RAM/AML.  ParameCers>  depending  on  whether  Equation  4-2  corres- 
ponds to  Equation  4-la,  4-lb  or  4-lc. 

Y11  TM  ♦ Yl  2 I'A  - £ fj.k  ^ + Yi3  T;  ♦ Yi4  t;  4-2 

In  Equation  4-2,  i pertains  to  the  ic^  composite  node  cycle 
and  k to  the  associative  fields  used  to  participate  in  composite 
node  cycle  i. 

All  the  graphs  used  to  illustrate  the  results  are  based  on 
Equation  4-2  with  certain  assumptions,  which  will  follow,  applied. 
The  equivalent  breakeven  bit  width  (Eq)  is  calculated  as  follows 
from  Equation  4-2: 


£ fi,k  r;,-  Yn  ♦ vi2  rA  - 

V £ fl.k  - Yn  * Y12 

k f i ft 


Yi  -i  f • - Y . a T’ 

iJ  m 14  A 

Y1 3 £m  - Yl4  £a 

T • T’ 


4-3a 


4-3b 


V * fi,k  - Yil^  ♦ Y12  ft.  -Yi4^  - Yi3 

k 

• 

where  oi.  m £m  , (3  - £a  , and  (]  ■ rA  . 

r i f*  2 r • 

M M M 

In  Equation  4-3b,  the  total  search  key  and  link  field  width  in 
bits  (Z  fj  k)  for  the  associative  architectures  is  set  equal  to  a 
linear  combination  of  the  Yjj's  and  the  computation  times.  In 


Equation  £-3c,  three  additional  parameters  are  introduced,  alpha 


betaj  and  beta2,  representing  various  timing  ratios.  Equation  4-3c 
is  sufficiently  complete  to  consider  trade-off  studies  of  archi- 
tecture performance  based  on  betaj  and  beta2  for  an  equivalent 
breakeven  bit  width.  In  one  respect  Equation  £»-3c  is  a primary 
result  of  this  research  in  that  it  represents  the  culnhnation  of 
one  inclusl ve--a lthough  lengthy--approach  to  comparing  various 
associative  and  sequential  architectures  by  the  two-step  approach 
of  specifying  the  architecture  in  terms  of  the  instruction  set  and 
specifying  the  algorithm  in  terms  of  the  storage  structure  and 
Instruction  set.  Extended  approaches  based  on  E^  should  yield  an 
equation  of  the  same  form  as  Equation  4-3c  with  perhaps  greater  or 
fewer  Y ^ ^ ' s. 

Equation  4-3c  is  still  too  complex  to  yield  simple  graphic 
results.  Therefore  two  assumptions  are  applied  to  the  equation 
to  yield  the  results,  which  are  presented  graphically  later  in  the 
chapter. 

The  first  assumption  used  for  plotting  is  that  fA  - P;  that 

A 

is,  that  the  time  required  to  execute  an  auxiliary  non-memory 
instruction  such  as  increment,  decrement,  et  cetera,  is  the  same 
for  all  architectures.  This  assumption  is  a direct  result  of  the 
manner  in  which  the  architectures  were  defined.  Each  architecture 
has  the  same  registers  and  the  same  set  of  auxiliary  Instructions. 
As  mentioned  earlier  this  similarity  was  enforced  to  attempt  within 


a comparable  architectural  ; raw  w»r>  to  isolate  the  S I Si;  approach 


from  the  SIMD  approach.  The  result  of  Che  assumption  is  Chat  beta^ 
equals  beta2,  and  thus  a single  parameter  beta  may  be  used  to  yield 
Equation  4-4  from  Equation  4-3. 

*q  * |*l.k  " (Yi2  - WO  yil^  -Yi3 

Che  mathematical  comparison  and  the  graphs  are  all  based  on 
Equation  4-4. 

rhe  second  assumption  used  for  the  graphical  presentation  of 
the  results  is  that  beta  is  equal  to  0.3.  This  is  the  value  Knuth 
uses  for  his  books,  and  a few  years  ago  it  did  represent  the  average 
ratio  of  auxiliary  instruction  time  to  compare  instruction  time  based 
on  commercial  implementation  for  RAM  computers.  Currently  the  trend 
seems  to  indicate  beta  should  be  closer  to  1.0;  however,  for  con- 
sistency with  Knuth,  0.5  will  be  used,  and  discussion  is  provided 
for  the  general  effect  of  beta. 

Equation  4-4  does  not  yet  represent  the  final  form  taken  by 
the  results.  Two  additional  parameters,  V and  6 , must  be  intro- 
duced in  Equation  4-3. 


fri,k  " (Yt2  " Yi4)(3  + Yil£<*  Yi  3 4“5 

_ S 

Y , as  mentioned  earlier,  is  a factor  used  to  increase  the 
degree  of  parallelism  where  possible  during  a search  operation.  For 
Instance,  suppose  Z f , ■,  i s twenty  bits  where  f . . is  the  number 
of  bits  in  search  field  k for  a specified  k.  If  the  search  is  a 
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fully  parallel  search  such  as  a less  than 
than,  et  cetera,  then  Y mav  usefully  be 
of  the  field  width,  in  this  case  twenty, 
for  $ in  the  term 


• f i » k whe  re  f 


t.k 


or  equal,  equal,  greater 
Increased  up  to  the  limit 
A similar  situation  exists 
is  the  transfer  field 


width  for  field  k.  In  essence  £ measures  the  parallelism  of  the 
transfer  mechanism.  Equation  4-3  then  represents  the  manner  in 
which  the  results  are  presented. 

The  graphical  results  are  plotted  against  alpha  as  the  inde- 
pendent variable.  This  was  based  in  part  on  the  work  of  Thurber 
and  Berg  [68]  and  Weinberger  [70],  both  of  whom  suggested  that 
the  memory  portion  of  the  computer  would  be  the  most  critical, 
particularly  where  Thurber  and  Berg  pointed  out  (and  Hodges  30] 
supported)  that  from  a cost  point  of  view,  the  associative  machine 
would  have  to  have  a slower  memory,  and  hence  slower  compare  times. 
Therefore  the  results  are  presented  in  such  a way  as  to  investigate 
this  prior  work  to  see  if  a slower  associative  memory  would  still 
be  competitive  from  a total  time  measurement. 

To  illustrate  the  preceding  di scussion--and  in  particular  the 
equivalent  breakeven  bit  width--more  clearly,  the  following 
example,  listed  in  Table  4,  is  presented  in  detail.  A few  prelim- 
inary remarks  are  included  below  as  a preamble  to  the  example. 

Figure  4 delineates  fifty-seven  composite  node  cycle  and  archi- 
tectural situations.  These  situations  are  detailed  in  Appendix  B 
as  follows.  Table  B-l  specifies  the  total  composite  node  time  for 
each  of  the  twelve  situations  for  queues.  This  is  the  total  amount 
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Table  A.  Example  of  E^  Equation  Development 
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of  computational  time  (in  parametric  form)  necessary  to  complete 
the  composite  node  cycle  listed  to  the  right  of  the  table  in 
abbreviation.  Similarly,  the  thirty-six  situations  formed  from 
cases  one,  two  and  three  of  priority  queues  alont  with  their  sub- 
• cases  are  specified  in  fable  B-2,  with  the  remaining  nine  situations 

for  priority  queues  four,  five  and  six  in  fable  B-3.  l'he  total 
computational  time  listed  in  each  of  these  tables  is  computed  by 
first  listing  each  algorithm  participating  in  a particular  compos- 
ite node  cycle  and  then  obtaining  from  fable  B-4  the  total  compu- 
tational time  tor  that  algorithm.  The  total  composite  node  cycle 
time  is  then  the  summation  of  the  separate  algorithm  times.  Table 
B-4  also  lists  the  figure  number  in  Appendix  B that  shows  the 
individual  algorithmic  flow  chart.  It  is  possible  then  by  wav  of 
Equations  4-1  through  4-5  and  Appendix  B to  study  how  an  algorithm 
or  subgroup  of  instructions  affects  the  outcome  of  a particular 
compar i son. 

fo  return  to  the  specific  example,  the  top  of  fable  4 repro- 
duces the  entries  in  fable  B-4  appropriate  to  case  one  of  queues 
and  in  particular  to  the  situation  comparing  tiie  RAM  to  the  AM. 

The  specific  algorithms  (Aj,  Ij,  Dj  and  DA]  for  the  HAM  and  Aj, 

I ^ , D*  and  DAJ  for  the  AM)  forming  a composite  node  cycle  are 
taken  from  Table  B-l.  The  total  time  tor  each  architecture  is 
then  summed  in  Table  4. 

The  remaining  portion  of  fable  4 develops  a slde-bv-side 
comparison  of  Equations  4-1  through  4-5  with  the  actual  numerical 
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example  for  queue  case  one.  In  Equation  4-1  the  total  time  for 
the  RAM  Is  set  equal  to  the  total  time  for  the  AM.  This  Is  the 
key  step  in  developing  the  equivalent  breakeven  bit  width  concept. 
By  setting  the  total  times  equal,  the  equality  Is  maintained  or 
destroyed  by  the  values  of  the  of  , (3  , V"  , £ and  £f  j k parameters, 
The  first  four  relate  directly  to  the  hardware  in  terms  of  rela- 
tive processing  times,  or  the  degree  of  parallelism  apparent  in 
the  associative  architectures.  As  such  they  eventually  relate  to 
hardware  cost.  The  last  parameter  is  the  equivalent  breakeven 
bit  width,  which  is  permitted  to  vary  as  the  eventual  balancing 
factor  to  maintain  equality.  This  balancing  process  evolves  from 
the  rest  of  the  example. 

In  Equation  4-2  in  Table  4 the  two  times  are  set  equal  in 


expanded  form.  Equations  4-3a,  3b  and  3c  partition  the  E^  term  and 
set  up  the  parametric  ratios.  Equation  4-4  introduces  the  assump- 
tion that  I A « Ta,  hence  0,  - <S2  - 6 . Equation  4-3  partitions 


the  Eq  term  into  the  memory  contribution 


cone r i but i on 


fsitk  and  the  transfer 


JjLrjS 


Note  in  this  example  the  E^  does  not 


I fr 

involve  a transfer  term,  and,  further,  the  coefficient  of  c<  is 
nor-parametric.  This  is  only  true  because  of  the  simpler  nature 
of  queues,  discussed  in  Chapter  III. 

The  final  step  results  from  substituting  of  - 1 and  ft  - 0.5 
into  the  equation.  Alpha  equal  to  one  means  that  the  time  to 
complete  a full  word  compare  in  the  RAM  is  equal  to  the  time  re- 
quired to  make  a bit  slice  compare  in  the  AM.  The  result  is  that 
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Co  balance  Che  equacion,  E^  muse  equal  eleven.  The  quesCion  Is, 


eleven  whac?  The  cerm  blc  wtdeh  or  more  properly  equivalenC  bic 
wldch  (EBW)  Is  incroduced  Co  answer  Chls  quesCion.  This  Is  Che  num- 
ber of  blcs  necessary  co  represenC  Che  maximum  values  of  Che  fields 
necessary  Co  Implemenc  a pardcular  composlce  node  cycle  on  any  of 
Che  associacive  arch  1 cecCu res . Herein  lies  Che  dichoComy  beeween  Che 
RAM  and  Che  associacive  arch  1 CecCu re - -chac  Is,  Che  cocal  processing 


Cime  of  Che  laccer  is  based  on  how  many  blc  slices  or  equivalenC  bic 

3 


jusC  ciced  was  made  up  of  cwo  Cerms 


’ Pf"‘ 


widchs  can  be  processed  in  parallel,  while  che  former  depends  on 
efficienc  node  organ!  zaC  i on.  l'he  Cerm  equivalenC  blc  wldch  is  used 
because  of  Che  pocencial  parallelism  of  Che  associacive  arch iCecCure . 
As  discussed  previously  Che  associacive  archicecCure  may  process 
(search  or  Cransfer)  in  parallel  (simu lcaneously)  from  one  co  Mw  bics. 
This  is  a funcCion  of  Che  inCernal  logic.  In  Che  specific  example 

and  PK.  The  former  cerm 
represenCs  a fully  parallel  search  where  che  number  of  bics  parcici- 
pacing  wichln  each  compare  cycle  is  a funcCion  of  Y . fhe  search  on 
PK  is  done  by  Feng's  bic  slice  search,  which  is  fixed  by  inCernal 
logic  ac  one  bic  per  compare.  Thus  i is  always  one  and  hence 
omiCCed  from  Che  equacion  since  including  1c  would  give  an  improper 
connoCaCion.  Therefore  consider  a PK  of  Cen  bics.  This  means  cen 
of  che  eleven  bic  widchs  available  co  balance  che  equacion  have 
already  been  used  up.  This  is  equivalenC  PK  value  of  2^  - 1 or 
1023.  In  cerms  of  queue  lengch,  Chls  means  1023  nodes  wichin 
each  queue  1 l sc  can  be  accommod aCed . 


^I’he  associacive  parallelism  is  also  affecCed  by  insuf- 
ficienc  memory  size  co  concain  Che  cocal  searched  lisC.  However, 
in  Chls  research  Che  memory  lengch  is  considered  adequace. 
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If  only  straight  bit  widths  or  slices  were  concerned,  there 


would  be  one  remaining  to  balance  the  equation.  However,  by 

M 

Increasing  Y from  one  to  up  to  2 V-1  bits  or  lists  can  be 
accommodated.  Equivalent  bit  widths  then  come  as  a result  of 
modifying  gamma,  which  Is  the  same  as  modifying  the  degree  of 
parallelism  within  the  associative  architecture.  Delta  plays  a 
similar  role  for  data  transfer  Involving  Lewin's  algorithm  and 
memory. 

To  complete  the  example  E^  is  now  discussed.  The  formulation 
discussed  up  to  now  in  Chapter  IV  has  concentrated  on  defining 
and  explaining  Eq.  i'he  question  as  to  whether  the  RAM  or  one  of 
the  associative  architectures  is  superior  ultimately  depends  on 
whether  the  E^  required  by  the  particular  composite  node  cycle  is 
less  than  (associative  superiority),  equal  to  (a  draw),  or  greater 
than  (RAM  superiority)  the  Eq  dictated  by  fixing  ot  , & , Y and 


-in-iuicb  case  where  the  number  of  ESW’s  is  cither  less  than 


or  greater  than  the  number  required  for  equation  balancing,  the 

discrepancy  converts  directly  to  associative  compare  times  T’.  in 

M 

terms  of  the  example,  if  PK  were  five  bits  and  LN  were  one  bit, 

f 

then  each  time  the  composite  node  cycle  was  executed  5T’  would  be 

M 

saved  in  actual  running  time.  Conversely,  if  PK  were  ten  bits  and 
LN  were  five  bits,  then  &r’  would  be  lost  vis-a-vis  the  RAM  for 

T M 

each  composite  node  cycle.  This  latter  situation  could  arise  for 


a Y m l associative  machine  such  as  STAKAN.  The  graphical  results 
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that  follow  represent  a rapid  method  to  ascertain  based  on 
parameter  fixing,  or  conversely,  Riven  c.  for  a particular 
problem,  to  determine  the  hardware  parameters  for  an  appropriate 
associative  implementation. 

In  the  graphs,  the  lines  represent  a fixing  of  the  '(3  , g' 

, f and  other  algorithmic  parameters  plotted  against  <*.  . lhe 
left  hand  scale  is  then  in  EBW's.  therefore  for  a given  <*  , 

E^  tor  equation  balancing  is  read  off  the  vertical  axis.  The 
line  for  the  current  example  is  the  upper  line  shown  in  figure  7. 
To  determine  the  actual  required  by  the  particular  composite 
node  cycle  requires  an  analysis  similar  to  that  clone  tor  LN 
and  PK  in  the  example.  This  analysis  is  done  in  the  context  of 
discrete  simulation  processing  requirements.  tor  instance,  in 
most  discrete  simulations  one  would  expect  priority  queue  list 
lengths  to  be  less  than  one  thousand  and  their  number  to  be 
perhaps  twenty  or  less.  There  may  be  other  situations  where 
list  lengths  might  varv  widely  from  these  figures  and  would  have 
to  be  considered  accord i ng I v. 

As  a final  introductory  note,  two  items  should  be  kept  in 
mind.  first,  the  results  to  follow  are  based  on  stated  assump- 
tions. As  such  the  conclusions  drawn  ' rorr  them  must  not  be 
generalized  without  due  care  beyond  the  assumptions.  Second,  the 
results  arc  not  by  themselves  intended  to  prove  that  one  architec- 
ture is  better  than  another.  Instead  thov  indicate  what  is 
required  to  mak<  such  a decision  and,  where  the  research  as. sump- 


73 


1 


r 


tions  are  met  and  field  widths  and  other  parameters  are  known, 
permit  such  a decision  to  be  made. 

These  decisions  are  shown  as  decision  areas  I and  II  on  the 
graphs  of  the  following  results.  Region  I is  favorable  to  the  RAM, 
while  region  II  is  favorable  to  the  particular  associative 
architecture  depicted  by  the  graph.  Each  graph  is  labelled  with 
a unique  equation  number  that  is  also  listed  in  the  table  of  E^ 
equations  appropriate  to  the  particular  results  section. 

4.2.  Queues 

The  first  information  structures  to  be  discussed  in  the 
results  are  the  queues,  the  equations  for  which  are  shown  in 
Table  5.  These  information  structures  operate  in  an  unusual  way 
both  naturally  and  within  the  random  access  memory.  This  is 
because  a queue  represents  a physical  ordering  of  individual 
items.  In  the  random  access  memory  this  becomes  a physical 
ordering  of  nodes,  which  in  turn  implies  that  no  keys  are  neces- 
sary because  .here  is  no  searching  for  the  next  item  in  the  queue. 
The  next  item  in  the  queue  is  either  the  last  item  put  in  (LIMJ) 
or  the  first  remaining  item  left  in  the  queue  (FIFO). 

This  physical  ordering  represents  a problem  in  implementing 
a queue  within  an  associative  memory  because  the  memory  is  basi- 
cally a parallel  search  device  that  selects  nodes  based  on  keys. 
Therefore  it  was  necessary  to  convert  the  queue  information 
structure  into  a key  search  structure.  This  was  done  in  two 
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different  ways,  as  Indicated  In  [able  3 


['he  tlrst  way  was  simply 


i 


to  Insert  a time  of  entry  key  in  the  node  (Equations  l,  2,  5 and  6 
in  Table  3).  This  is  straightforward  in  a discrete  simulation  since 
there  is  a simulation  clock  available.  Based  on  such  a key  a LIFu 
queue  can  be  maintained  with  a maximum  search  and  a FIFO  queue  can 
be  maintained  with  a minimum  search.  Method  one  then  converts  the 
queue  into  a priority  queue,  as  will  method  two,  discussed  next. 

The  second  method  was  to  use  either  one  counter  for  LIFu 
queues  or  two  counters  for  FIFO  queues  (.liquations  3,  U,  7 and  8 
in  Table  5).  These  methods  were  based  on  the  fact  that  since 
queues  are  maintained  in  phvsical  order  it  is  possible  uniquely  to 
serial  number  the  nodes  in  a contiguous  fashion.  Therefore  in  the 
case  of  LIFu  queues,  when  the  time  came  to  remove  the  next  notie, 
an  ’equal  to’  search  on  the  last  serial  number  stored  selected  the 
proper  node.  In  the  case  of  FIFO  queues,  the  tlrst  counter  was 
used  to  insert  the  serial  number  and  a second  was  used  for  'equal 
to'  searches  for  removal.  two  benefits  accrued  to  this  second 
method  of  maintaining  queues  in  an  associative  memorv.  The  first 
was  that  an  'equal  to'  search  is  a completely  data  parallel  search. 
The  second  was  that  when  queues  are  used  in  discrete  simulation  ir 
is  frequently  required  as  part  of  the  data  generation  function  to 
know  at  various  time  points  the  number  of  items  remaining  in  the 
queue.  This  information  is  automat ica 1 1 v available  in  the  Lltu 
case  and  is  the  difference  of  thi  two  counters  in  the  r Iru  case. 
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Figure  7 and  8 illustrate  graphically  the  results  of  comparing 


the  two  associative  approaches  with  the  single  random  access 
approach.  Note  that  the  equations  for  LIFO  and  FIFO  queues  are  the 
same;  therefore  only  one  set  is  plotted. 

fhe  most  important  result  is  the  role  of  gamma.  To  discuss 
this,  consider  again  the  previous  numerical  example,  assuming  an 
operating  point  of  alpha  equal  to  one.  This  means  for  Equation  1 
In  Table  5 that  there  are  eleven  memory  cycles  available  for  Eq. 
These  memory  cycles  must  be  apportioned  between  two  field  terms 
LN  and  PK.  Recall  that  method  one  uses  minimum  or  maximum 

7 

searches.  These  searches  are  not  fully  data  parallel  because  there 
must  be  some  reconciliation  among  the  words  themselves  as  opposed 
to  independent  comparison  with  some  external  value.  Feng  has 
shown  that  such  a search  can  be  conducted  with  gamma  equal  to 
one  [_2ll.  To  increase  gamma  requires  either  going  to  special 
circuitry  for  ordered  retrieval,  such  as  Lewin's  algorithm,  or  the 
conversion  from  a minimum  or  maximum  search  to  a fully  data  parallel 
search.  To  return  to  the  example,  Equation  1 in  Table  5 reflects 
that  the  selection  of  the  proper  queue  by  the  LN  key  exhibits  full 
data  parallelism,  since  each  node  is  checked  Independently,  which 
means  gamma  can  be  increased  bevond  one.  If,  however,  gamma  is 
one,  and  if  there  are  some  thirty-two  linear  lists  colocated 
in  the  AM,  there  are  six  bits  or  EbW  left  for  PK,  which  may  or  mav 
not  be  an  unacceptable  number  for  queue  membership  (63  members). 

If  gamma  were  greater  than  or  equal  to  LN,  ten  EbW's  would  normally 
be  adequate. 


77 


However,  method  two--the  counter  method - -permi ts  gamma  to  play 
a role  for  both  LN  and  PK.  Based  on  Figure  8 (Equations  3 and  7) 
at  alpha  equal  to  one,  there  are  seven  KBW’s.  If  gamma  were  equal 
to  the  maximum  of  LN  and  PK,  only  two  memory  cycles  would  be  needed. 
Assume  for  a moment  that  gamma  was  equal  to  six  and  LN  was  equal 
to  six.  This  means  that  up  to  sixty-four  lists  can  be  stored  in 
the  AM  and  that  there  are  six  EBWs  left  for  PK/*'  . This  means 

PK  could  take  on  a value  of  up  to  thirty-six  bits,  a value  generally 
more  than  adequate  for  the  counter  values. 

As  perhaps  a more  interesting  example  consider  the  same  values 
as  above  but  with  gamma  equal  to  two.  Three  EBWs  would  be  required 
for  LN,  leaving  four  EBWs  or  eight  for  PK.  This  means  a value  of 

o 

2-1  or  255  for  each  of  the  queue  counters.  This  means  that  if  no 
queue  ever  exceeded  that  number  of  nodes,  the  AM  using  method  two 
would  be  on  par  with  the  RAM.  If  gamma  or  alpha  were  increased, 
the  AM  would  show  an  advantage. 

The  second  result  is  that  the  use  of  the  hvbrid  architecture 
(AM/KAM)  requires  only  an  additional  two  EBWs  to  operate  over  the 
AM.  The  third  result  is  that  the  timings  for  the  algorithms  are 
dominated  by  Y j ( j , the  coefficient  of  the  alpha  term.  This  means 
that  the  preponderance  of  computer  time  is  taken  by  memory  instruc- 
tions. In  general  by  changing  the  algorithm  to  incorporate  parallel 
search  and  increasing  Y . associative  architecture  materially 
Improves  queue  performance. 
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The  prime  interest  in  studying  priority  queues  within  the 


context  of  discrete  simulation  is  that  they  are  the  information 
structure  used  to  implement  time  flow  mechanisms  based  on  the  Vfl 
method.  Unfortunately,  the  priority  queue  does  not  seem  to  be  well 
suited  to  the  two  primary  associative  architectures  considered  for 
this  research,  the  AM  and  AM/RAM.  This  problem  arises  from  several 
sources  which  will  be  discussed  prior  to  presenting  the  results. 

The  priority  queue  is  based  on  selecting  the  node  with  the 
greatest  or  smallest  key  value  for  the  search  field.  For  time  flow 
mechanisms,  the  primary  key  is  the  simulation  time,  which  means  a 
search  on  the  smallest  time  to  determine  the  next  potential  state 
change.  This  means  a minimum  search,  which  as  pointed  out  in  the 
preceding  section  is  not  a fully  data  parallel  search.  Unlike  the 
queue  which  by  its  nature  permits  a straightforward  conversion  to 
a full  data  parallelism  by  the  use  of  counters,  in  the  priority 
queue  there  is  no  guarantee  that  the  nodes  will  be  contiguous  based 
on  search  key  values.  fwo  additional  sources  of  difficulty  arise. 
Une  is  that  the  CDLLL  is  an  efficient  structure  for  representing 
the  priority  queue  in  the  RAM  when  it  is  coupled  with  a sort-in 
process.  Studies  by  Conway  [_ 8 J , Lave  [_&4j,  Morgan  and  Siegel  JOj 
and  Knuth  [ 36]  indicate  that  in  general  it  becomes  more  efficient 
with  respect  to  its  own  overhead  as  the  list  grows  and  as  the 
future  state  changes  become  less  dense  compared  to  a random  list. 
The  latter  case  of  decreasing  density  seems  to  be  the  predominant 
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case  in  discrete  simulation.  The  last  source  of  difficulty  is  that 
the  sort-in  method  used  for  priority  queues  for  RAM  architecture 
has  built  within  it  a queue.  This  comes  about  because  the  sort-in 
process  forces  nodes  with  equal  keys  to  be  separated  by  a FIFO 
queue  discipline.  This  last  situation  brings  the  dichotomy  between 
the  AM  and  the  RAM  into  sharp  relief  because  the  AM  composite  node 
cycle  time  increases  linearly  with  search  field  width  and  as  was 
pointed  out  in  the  previous  section,  the  AM  requires  an  extra  or 
artificial  key  to  replace  the  physical  queue  order.  This  means  that 
the  AM  to  compare  favorably  must  be  able  to  absorb  the  extra  field 
width,  and  compete  with  an  efficient  RAM  process,  without  the 
benefit  of  a fully  data  parallel  search  technique,  at  least  using 
the  AM  and  AM/RAM  architectures. 

The  results  are  presented  in  three  major  categories.  The 
first  includes  la,  2a  and  3a  in  Table  b.  These  results  represent 
a composite  node  cycle  where  a single  node  is  selected  which 
corresponds  to  the  next  potential  state  change.  The  first  group  of 
equations  (case  la)  in  this  category  (9,  10,  11)  is  shown  graphi- 
cally in  Figure  9 for  the  case  where  the  parameter  a is  one  and 
five.  The  parameter  a is  a sort-in  factor,  which  is  the  number  of 
nodes  expected  to  precede  the  new  node  to  be  inserted  into  the 
priority  queue  before  the  new  node  is  placed  in  its  proper  ranked 
order.  The  results  Indicate  that  as  the  parameter  a increases, 
the  value  of  increases.  The  value  of  Eq  is  probably  adequate 
at  a - 3 and  alpha  - 1 for  most  simulations  based  on  the  GASP  11 
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discrete  simulation  programming  package  |_62~).  This  package 
Initially  allotted  sixteen  bits  to  the  primary  key,  which  is 
simulation  time.  This  leaves  ten  bits  for  LN  in  the  AM  case, 
gamma  notwithstanding.  The  problem  is  that  the  AM  case  is  not 
necessarily  representative  since  a separate  key  which  is  a refine- 
ment on  the  primary  key  has  not  been  set  aside  for  the  FIFU 
default  ranking  in  case  of  ties.  This  additional  key  would 
normally  be  the  time  of  list  entry  based  on  the  simulation  clock, 
which  means  that  It  would  be  of  the  same  magnitude  as  PK.  This 
would  require  thirty-two  bits,  then,  with  sixteen  for  PK  and 
sixteen  for  TK.  TK  is  used  to  designate  the  FIFO  queue  key.  Under 
these  circumstances,  the  AM  probably  would  not  compare  favorably, 
much  less  the  AM/RAM. 

The  AML  was  added  to  the  architectural  choices  to  determine 
if  a special  searching  capabi 1 ity--in  this  case  Lewin's  aigorithm-- 
would  alleviate  some  of  the  difficulties  mentioned  previously.  In 
general  the  AML  architecture  requires  more  overhead  than  the  other 
associative  cases,  which  means  that  the  graphs  shown  in  Figure  9 
are  displaced  downward  from  their  counterparts.  However,  note  that 
Eq  is  now  based  on  a fully  parallel  search  procedure,  which  means 
that  gamma  can  be  used  to  reduce  the  number  of  EBW's  for  LN  and  PK 
(see  Equation  11  in  Table  6),  resulting  in  a net  associative 
advantage. 

Case  2a  in  T-’ble  6 represents  another  common  situation  in 


discrete  simulation.  That  is  where  an  extra  field  is  used  to  break 
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ties  prior  to  the  FIFO  default  ranking.  This  Is  normally  called  a 


! 


L 


priority  field  and  In  the  equations  is  indicated  by  SK.  An 


additional  RAM  parameter,  b,  also  appears,  which  is  another  sort-in 
parameter  dealing  with  the  expected  number  of  nodes  that  have  the 
same  value  of  PK.  It  counts  the  number  of  nodes  having  the  same 
value  of  PK  through  which  a new  node  must  be  sorted.  Case  3a  has 
effectively  the  same  equations  as  2a  except  that  for  the  AM  and 
AM/RAM  architectures  the  default  field  TK  has  been  added.  The 
results  are  shown  in  Figure  10  and  they  indicate,  as  was  mentioned 
before,  that  as  time  increases  with  field  width,  coupled  with  a 
search  method  which  is  not  fully  data  parallel,  the  AM  and  AM/RAM 
come  into  increasing  disadvantage.  Increasing  b also  favors  the 
AM  and  AM/RAM. 

Equations  14  and  17  in  Table  6 reflect  only  two  keys,  LN  and 
and  PK.  This  is  because  the  FTI  portion  of  the  F1MV  I'M  operates 
only  on  PK.  Figure  10  illustrates  the  results  for  Equations  14 
and  17  and  once  again  indicates  the  important  role  of  gamma  in 
making  the  AM  competitive. 

Multiple  Identical  responses  or  state  changes  are  considered 
next.  The  equations  representing  this  case  are  listed  in  Table  7 
and  the  results  are  Illustrated  graphically  in  Figure  11  and  12. 

The  RAM  ’a’  and  ’b*  parameters  have  been  replaced  with  expected 
values  (over  the  M retrievals)  and  a parameter  ’M*  has  been  intro- 
duced to  represent  the  number  of  identical  state  changes.  The 
transfer  width  parameter  delta  now  appears  in  the  AML  equations 
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since  a transfer  to  the  AML  must  take  place.  Since  the  transfers 


are  made  by  field,  each  individual  field  width  must  be  accounted 
for  separately.  The  series  of  equations  follows  the  same  three 
cases  for  the  'a*  subcase  in  the  sense  that  increasing  field  width 
is  considered  by  adding  respectively  Sk  and  Tk. 

The  general  result  in  the  'b'  subcase  is  that  a multiple 
identical  response  is  favorable  to  all  the  associative  architec- 
tures because  it  is  not  necessary  to  repeat  the  composite  node 
cycle  for  the  M-l  additional  responses  as  it  would  be  in  the  KAM. 
Therefore,  to  arrive  at  the  equations,  the  single  node  KAM  time 
was  increased  by  a factor  of  M,  while  only  the  allocate,  insert 
and  deallocate  portions  of  the  three  associative  architectures 
were  increased  by  M.  In  the  case  of  the  AML,  the  transfer  step 
is  also  included  since  it  is  part  of  the  delete  (search)  step. 
Notice  also  in  the  graph  that  the  vertical  scale  factor  has  been 
changed.  As  in  the  previous  'a'  subcase,  the  AML  curves  represent 
lower  overall  values  for  Eq  compared  to  the  AM  and  AM/RAM  cases 
(M  ■ 4 in  the  graphs).  The  gamma  and  delta  parameters  can  now 
be  used  to  reduce  the  bit  width  requirements.  A low  value  of  M 
and  b such  as  used  here  entirely  favor  the  RAM  architecture. 

The  last  case,  shown  in  Table  8,  is  the  most  interesting, 
since  It  indicates  a sharp  departure  between  the  AM  and  the  AML. 

In  this  category,  the  RAM,  AM  and  AM/RAM  total  composite  node  cycle 
times  were  all  increased  by  a factor  of  M.  This  means  that  there 
Is  no  savings  in  time-  for  the  AM  and  AM/RAM  architectures  as  there 
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was  In  subcase  * b' 


However,  In  the  AML  case  the  allocate,  insert 


and  deallocate  times  of  the  composite  node  cycle  were  increased  by 
M as  before,  but  now  instead  of  increasing  the  delete  time  by  M, 
it  is  only  increased  by  an  average  of  two  EbVJ's.  L'his  is  the 
average  ordered  retrieval  time  for  Lewin’s  algorithm,  since  the 
total  ordered  retrieval  time  is  2M-1;  hence  retrieval  time  no  longer 
increases  linearly  with  M. 

The  poor  results  of  the  AM  and  AM/ RAM  architectures  are 
plotted  in  Figures  13  and  14  (Equations  27,  28,  30,  31,  33  and  34). 
The  mild  slopes  of  these  plots  are  indicative  of  the  small  number 
of  bits  available  to  meet  all  the  field  requirements  listed  in 
Table  8 as  part  of  the  equations.  The  problem  of  the  small  number 
of  bits  is  compounded  by  the  lack  of  parallelism  available  for 
primary,  secondary  and  tertiarv  kc-y  searches. 

In  contrast  to  the  results  of  the  AM  and  AM/ KAN,  the  AM/KAM/AML 
architecture  results  plotted  in  figures  13  and  14  (Equations  29,  32 
and  35)  exhibit  fairly  steep  slopes.  This  means  that  a relatively 
large  number  of  bits  is  available  to  satisfy  field  requirements, 
further,  these  field  requirements,  bv  virtue  of  the  FI  MV  algorithm, 
exhibit  full  parallelism  and  hence  require  fewer  bits  than  the  Am 
and  AM/ RAM  fields.  As  an  example,  consider  Equation  29,  plotted  in 
Figure  13  for  M = 4 and  a « «c  =>  1 . in  this  case  is  forty  bits, 

!f  gamma  and  delta  were  as  little  as  two,  this  would  allow  twenty 
bits  for  each  of  the  four  fields  that  make  up  the  total  requirement 
for  En.  In  general  this  would  be  more  than  adequate.  for  Equations 
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32  and  35,  plotted  in  Figure  14,  the  number  of  fields  has  increased 


by  one  and  two  respect ively  over  Equation  29.  Under  the  conditions 
of  M » 4 and  a - b - - 1,  there  are  slightly  less  than  forty  bits 

to  satisfy  the  total  Eq  requirements.  This  means  that  gamma  and 
delta  must  be  Increased  to  three  and  perhaps  four  to  maintain  a 
performance  equivalent  to  that  of  Equation  29.  A more  realistic 
set  of  parametric  values  for  Equations  32  and  35  would  be  M =*  4, 
a » ),  b - 1 for  » 1 . This  represents  the  upper  plot  of 
figure  14.  Here  seventy  bits  are  available  for  distribution  among 
five  and  six  fields  respectively.  A value  of  two  for  gamma  and 
delta  would  be  quite  adequate. 

fhe  major  result  then  is  that  the  K I MV  algorithm  coupled 
with  the  AML  architecture  does  show  a definite  advantage  over 
both  the  RAN  and  conventional  AM  and  AM/RAM  architecture.  The 
b parameter  was  kept  low  to  favor  the  random  access  architecture. 

In  general,  increasing  b favors  the  associative  architecture  by 
increasing  the  sort-in  time  necessarv  for  the  KAN. 


4,i4.  Ihriontv  (jueues,  Ca  sos  4,  3 , ^ 

This  section  covers  three  algorithms  normally  available  as 
auxiliary  algorithms  in  discrete  simulation.  i'hese  three  are  also 
referred  to  as  random  lists  because  thev  represent  priority  queues 
that  are  maintained  in  the  RAM  in  an  unordered  tashion.  This  situa- 
tion occurs  m discrete  simulation  when  it  is  netessarv  to  search 
for  and  select  a node  in  a priority  queue  that  is  not  ranked  on  the 
search  field.  This  situation  also  occurs  in  FT  I time  flow  mechanisms 
where  the  list  ot  future  events  is  not  ordered  44,  3o], 

The  (.filiations  for  the  various  information  structures  considered 
are  listed  m fable  «■),  The  first  random  list  considered  (case  4) 
is  one  in  whicn  the  list  is  searched  to  find  the  first  node  meeting 
some  criterion.  1’his  mav  not  be  the  onlv  node  meeting  such  a cri- 
terion. and  tni  possibility  of  intentionally  selecting  more  than  one 
is  covered  in  .-as--  3,  :..ie  tlA  : algorithm  is  nas&d  on  the  'c ' parame- 
ter, which  is  the  expected  list  depth  in  nodes  before  a success. 

If  the  last  node  in  the  list  were  the  onlv  successful  node,  then  c 
would  equal  1..  In  the  associative  case  the  parameters  are  LM  and 
i’r\.  Gamma  difects  both  these  parameters  since  they  botn  can  be 
used  in  a fully  data  parallel  search.  figure  13  illustrates  two 
families  of  curves.  The  first  is  tor  c - 1,  the  worst  case  com- 
parison lor  the  AM,  and  c “ 5 . At  alpha  equal  to  one,  these  two 
curves  vield  for  tie  AM  arco i tectu re  seventeen  and  tortv  GbW’s 
rcspc'ct  i ve  1 v.  At  gamma  equal  to  two  this  becomes  thi  rtv-iour  ant' 
eight  uKW's  respectively,  which  oven  in  the  worst  case-  would  cover 
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a large  dynamic  range  for  the  primary  key. 

In  case  5 (Figure  16)  the  search  is  based  on  finding  all  nodes 
that  meet  some  criterion.  This  means  in  the  KAM  architecture  that 
the  total  list  must  be  searched.  This  introduces  the  d parameter, 
which  represents  the  number  of  successful  nodes.  The  algorithm 
for  the  AM  architecture  is  the  same  as  case  4.  The  main  difference 
between  case  4 and  case  5 is  that  the  AM  shows  a greater  advantage 
for  a given  operating  point  for  five.  In  fact  for  a simple  case 
with  a list  length  of  ten  and  two  successes,  has  a positive 
value  even  at  of  « 0 (that  is.  Pm  " 0).  ['he  predominant  effect 
comes  from  lj  although  increasing  d does  favor  the  AM.  Again,  d 
and  lj  were  kept  low  to  favor  the  RAM.  This  seems  to  indicate 
that  the  greatest  advantage  accrues  to  the  AM  when  it  is  searching 
in  a full  data  parallel  mode  vis-a-vis  the  RAM  searching  sequential- 
ly an  unordered  list. 

Cases  4 and  5 also  serve  as  models  for  FT  I TFM's.  Case  4 
corresponds  to  selecting  the  next  potential  state  change  within 
At,  where  there  is  a low  probability  of  multiple  state  changes; 
and  case  3 corresponds  to  the  case  where  multiple  state  changes 
can  take  place  in  At.  In  essence,  then,  the  changeover  suggested 
by  Morgan  and  Siegel  [ 50  | between  FPI  and  /PI  unconditional  TFM's 
amounts  to  changing  from  the  information  structure  of  case  4 to  a 
priority  queue  (e.g.,  case  1,  2 or  3)  and  back  again.  One  of  the 
questions  raised  about  Morgan  and  Siegel's  work  by  Wickham  71 1 was 
the  question  of  the  amount  of  overhead  involved  in  switching.  This 
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overhead  amounts  to  reordering  the  list  when  switching  back  t rom  a 


random  FTI  list  to  a priority  queue.  It  is  interesting  to  note  in 
this  regard  chat  the  associative  memory  always  maintains  its  lists 
in  an  unordered  fashion,  and  therefore  there  is  at  least  no  over- 
head due  to  ordering  in  implementing  Morgan  and  Siegel's  method  in 
an  associative  memory.  Since  Morgan  and  Siegel  pointed  out  that 
the  other  portion  of  their  scheme  involves  a simple  numerical  pre- 
diction process,  it  would  seem  that  the  AM  would  offer  a good 
vehicle  for  reconsideration  of  this  scheme.  The  major  problem 
remaining,  however,  as  tmshoff  and  Sisson  point  out  lbj,  is  that 
there  is  a time  error  and  posslblv  a precedence  error  in  the  rTI 
method.  In  the  case  of  the  time  error,  Gafarian  and  Ancker  point 
out  that  there  is  alwavs  a loss  of  information  about  the  behavior 
of  the  simulated  system  [25],  For  this  reason  the  FIMv  time  flow 
mechanism  was  introduced  in  the  section  discussing  the  priority 
queue  results  for  cases  1,  2 and  3 since  it  appears  to  alleviate 
all  the  difficulties  mentioned. 

Case  6 is  based  on  minimum  or  maximum  searches  of  unordered 
lists.  The  e parameter  introduced  corresponds  to  the  number  of 
interchanges  necessary  within  the  RAM  algorithm  to  keep  the  value 
of  the  minimum  or  maximum  current.  This  algorithm  represents  for 
the  RAM  the  unordered  method  of  maintaining  a priority  queue.  It 
could  also  serve  as  a model  tor  an  unordered  retrieval  based  on  a 
FFI  Tr'M.  But,  as  suggested  previously,  it  is  generally  more  eco- 
nomical in  the  RAM  to  maintain  priority  queues  bv  a sort-in  proces 
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Figure  17  illustrates  some  results  for  Eq  based  first  on  the  worst 
case  for  the  AM  where  lj-e-1  and  a second  case  for  lj-10  and  e«5. 
Notice  for  the  latter  case  that  again  E^  always  shows  a positive 
value.  Note  also  that  in  the  above  three  lists  the  AM/RAM  archi- 
tecture operates  at  a disadvantage  to  the  AM  of  only  two  EbW’s. 


4.5.  Parametric  Variations 

In  sections  4.2,  4.3  and  4.4,  the  intent  was  to  illustrate  the 
method  and  results  pertaining  to  a particular  selection  of  parame- 
ters. In  this  section  the  Intent  is  to  take  a subset  of  all  E^ 
equations  and  study  the  effect  of  varying  values  for  alpha  (c4-.75, 
«*“1.0)  and  fJ  ((*-0,  0.5,  1.0)  while  permitting  Y - f ="Mw.  The  RAM 
parameters  are  all  fixed  at  the  most  favorable  values  or,  conversely, 
at  the  worst  case  for  the  associative  architectures.  The  subset  of 
Eq  equations  chosen  was  that  which  made  the  various  associative  ar- 
chitectures most  competitive  within  each  case. 

These  results  are  listed  in  Tables  10,  11  and  12.  In  each  table 
the  particular  case  is  listed  followed  by  the  equation  number  and 
then  by  a column  titled  Min  Eq.  This  column  lists  the  minimum  E^ 
value  necessary  (based  on  If  “f“Mw)  for  the  associative  architecture 
to  function.  The  next  columns  list  the  Eq  requirements.  These 
requirements  are  the  minimum  necessary  to  balance  the  Eq  equations 
based  on  the  values  of  at  , (3  , and  the  other  parameters  listed  to 
the  far  right  of  the  tables.  Therefore,  in  comparison,  if  the  mini- 


mum Eq  is  less  than  the  Eq  requirement  in  a particular  row,  the 
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associative  architecture  is  superior.  If  the  values  are  the  same 


the  equation  balances  at  the  breakeven  value;  and  if  the  require- 
ment exceeds  the  minimum  E^  value,  then  the  RAM  is  superior. 

Queue  results  are  listed  in  Table  10.  In  ten  out  of  the 
• twelve  cases  for  ot  - 1,  the  associative  architectures,  AM  and  AM/RAM, 

are  superior  to  the  RAM.  In  two  cases,  Equations  U and  8 for 
(1  "1.0,  they  are  equal  to  the  RAM.  If  the  associative  compare  time 
is  decreased  by  25/.  with  respect  to  the  RAM  (o<-.73),  the  associa- 
tive architectures  are  then  superior  in  only  two  out  of  twelve 
cases  ((i-0.0).  In  the  remaining  ten  cases  the  associative  archi- 
tectures are  all  Inferior.  This  Indicates  that  o c plays  a signi- 
ficant role,  as  suggested  earlier. 

Priority  (Queues  1,  2,  3 are  covered  in  Table  11.  All  the 
results  presented  in  Table  11  are  based  on  the  AM/RAM/AML  architec- 
ture, and  the  a and  b parameters  all  represent  the  worst  case  for 
this  architecture.  M is  still  left  as  a parameter. 

For  «<  -Nhl.O  in  all  but  two  cases,  the  AM/RAM/AML  architecture 
is  clearly  superior  to  the  RAM.  In  the  other  two  cases,  Equations  20 
and  29  for  (1  -1.0,  M-2  is  sufficient  to  make  the  AM/RAM/AML  archi- 
tecture superior.  Even  for  a 25/.  reduction  of  relative  compare  time 
for  the  AM/RAM/AML  architecture,  all  but  one  of  the  outcomes  under 
subcases  la,  2a  and  3a  are  superior  to  the  AM,  The  remaining  case. 
Equation  11  at  (*  -1.0  ties  with  the  RAM. 

With  respect  to  subcases  b and  c under  the  25,.  reduction,  in 
all  but  two  outcomes  M-2  is  sufficient  to  make  the  AM/RAM/AML 
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superior  Co  Che  RAM.  In  Che  remaining  t»u  outcomes,  equations  20 
and  29  at  (3  =•  i.O,  M =*  3 is  sufficient  for  Che  same  result.  Since 

M ■■  U,  as  referred  Co  in  Section  4.3,  is  considered  Co  be  a fair 
value  for  a discrete  s i mul at i on , Che  AM/RAm/AKL  can  easily  sustaiin 
a 25  o reduction  in  speed,  and  in  selected  cases  for  a higher  value 
of  M,  even  greater  reductions.  Phis  illustrates  another  of  the 
types  of  trade-offs  ai'aj'able,  which  would  be  to  go  to  a slower 
memory  while  increasing  the  processing  width. 

fable  12  lists  the  results  for  the  remaining  priority  queues. 

The  results  for  Priority  (Queues  4 and  5 indicate  that  the  AM  or 
AM/ KAN  architecture  is  superior  to  the  RAM  at  both  oC  " 1.0  and 
<*  .75.  Priority  'Tueue  6 still  relies  on  the  ■ 1 minimum 

search,  so  PK  is  listed  separately.  For  oe.  • 1.0,  the  AM  and  AM/RAM 
architecture  should  both  be  sii|X'rior,  ass  'tiling  sixteen  hits  as  an 
upper  bound  for  ITC.  The  25  « reduction  docs  generally  make  the  RAM 
superior;  howt'ver,  in  actual  i mpH  mrntar  i or  the  techniques  of  Pri- 
ority (|ieue  la  should  be  substituted  tor  ’hose  of  Priority  dueue  b. 

The  role  of  (3  as  seen  ' ables  10,  11  and  12  is  one  of 

biasing  the  results  in  tavor  of  one  of  the  architectures.  Note, 
however,  that  the  bias  is  not  always  in  the  same  direction  (e.g,, 
fable  11).  this  phenomenon  comes  ahout  as  a resu't  of  the  fact  that 
sometimes  the  preponderance  of  overhead --non-memorv  inst ruct i ons-- 
lies  with  the  associative  architecture,  and  sometimes  with  the  random 
access.  Therefore  (3  , as  a measure  of  the  degree  of  influence  of 
this  overhead,  plavs  an  important  although  not  pre-emptive  role 
in  determining  architecture  be-havior. 
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4,6.  Summary 


The  main  results  are  the  various  equations  associated  with  the 
queue  and  priority  queue  information  structures.  These  equations 
represent  a source  of  considering  not  only  the  usefulness  of  the 
various  architectures  but  of  the  tradeoffs  associated  with  memory 
speed  versus  data  parallelism.  Since  the  equations  do  represent  a 
rich  rource  of  information,  the  approach  used  in  presenting  the 
results  was  to  Illustrate  the  use  of  the  equations  by  way  of  selec- 
ted graphs  and  examples  that  pertain  to  discrete  simulation  instead 
of  an  exhaustive  discourse.  Specific  conclusions  and  recommenda- 
tions are  presented  in  the  next  chapter. 
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conclusions  and  recommendations 

5.1.  General  Conclusions 

The  general  concern  of  this  research  is  the  utility  of  the 
associative  memory  for  non-numeric  processing  in  discrete  simula- 
tion. The  particular  concern  is  priority  queues,  since  they  form 
the  basis  of  the  time  flow  mechanisms  and  are  inherent  in  many 
simulation  models.  The  research  indicates  that  the  associative 
memorv  can  process  priority  queues  more  efficiently  than  the  random 
access  memory  under  the  assumptions  and  constraints  of  the  research. 
It  further  indicates  that  hybrid  memor ies--such  as  the  use  of  the 
random  access  memory  in  conjunction  with  the  assoc iat i ve-- i s 
promising  not  onlv  in  terms  of  performance  but  also  in  terms  of 
relative  cost. 

The  research  selected  or  created  parametric  hardware  and 
software  models  that  could  be  matched  to  each  other  to  process  a 
variety  of  priority  queues.  The  hardware  models  consisted  of  a 
random  access  mcmorv,  an  associative  memory,  a random  access  memory 
in  conjunction  with  an  associative  memory,  and  a random  access 
memory  in  conjunction  with  an  associative  memory  with  an  added 
auxiliary  memory  implementing  I.ewin's  minimum  retrieval  algorithm. 

The  key  hardware  parameters  were  the  ratio  of  the  random 
access  memory  speed  to  the  associative  memory  speed  ( «T ) ; the 
ratio  of  the  random  access  non-memory  instruction  time  to  the 
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associative  non-memory  instruction  time  ( (3  ) j and  the  degree  Gf 
parallelism  in  transferring  data  between  various  memories  when 
they  were  used  in  a hybrid  fasion,  expressed  as  the  number  of 
bits  simultaneously  active  (delta).  The  software  parameters  are 
related  to  individual  priority  queues  of  which  the  main  ones  are 
list  length  (lj)  and  the  number  of  consecutive  nodes  retrieved 
in  a given  search  (M). 

The  first  queues  studied  were  the  simple  LIFO  and  FIFO  queues. 
They  lend  themselves  only  moderately  well  to  associative  memories. 
The  major  problem  in  fitting  simple  queues  into  associative 
memories  is  in  obtaining  a high  degree  of  parallelism  during  the 
search  for  the  next  node.  To  obtain  this  parallelism  the  slow 
single  bit  at  a rime  minimum  associative  search  was  converted  into 
an  "equal  to"  fully  parallel  associative  search  by  changing  the 
algorithm.  Once  this  conversion  was  made  and  assuming  the  maximum 
value  of  gamma,  the  associative  memory  and  the  associative  memory 
combined  with  the  random  memory  proved  to  be  good  for  alpha  equal 
to  or  greater  than  one  (Table  10).  For  alpha  less  than  one  the 
random  access  memory  is  definitely  superior.  I'he  use  of  Lewin's 
minimum  search  algorithm  is  inappropriate  although  the  algorithm 
controls  processing  time  growth  with  list  length.  This  is  because 
the  performance  of  the  simple  queues  in  the  random  access  memory 
is  Independent  of  list  length  (lj). 

The  next  group  of  priority  queues  deals  with  priority  queues 
used  as  time  flow  mechanisms.  Three  versions  of  priority  queues 
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are  considered  in  relation  to  situations  normally  found  in  discrete 
sinulatlon.  The  associative  and  associative  combined  with  random 
access  memories  proved  to  be  of  marginal  utility  compared  to  the 
random  access  memory  alone.  The  combined  associative,  random  and 
Lewin  memory  did  show  great  utility  when  coupled  with  the  fixed 
increment  minimum  value  time  flow  mechanism  algorithm.  The  results 
indicate  that  the  associative  memory  combination  can  be  25/.  and 
in  some  cases  50/.  slower  than  the  random  access  memory  and  still 
show  an  overall  processing  time  advantage.  It  is  also  possible 
to  reduce  the  degree  of  parallelism  (gamma)  from  the  maximum  for 
alpha  less  than  one  without  losing  the  associative  advantage. 

This  permits  a certain  degree  of  tradeoff  in  design  parameters 
that  offset  overall  cost.  The  f'IM\/  ff'N  also  seems  to  offer  advan- 
tages discussed  later  for  combining  various  types  of  discrete  simu- 
lation IFM's  into  a unified  model  and  simulation  view. 

The  remaining  priority  queues  studied  were  all  concerned  with 
various  auxiliary  operations  with  discrete  simulation.  In  these 
cases  the  associative  and  associative  combined  with  random  access 
memories  showed  a marked  advantage.  rhis  would  permit  good  flexi- 
bility in  desien  trade-offs.  Lewin's  algorithm  was  not  investigated 
here  because  of  the  success  of  the  conventional  associative  tech- 
ni ques. 

In  all  prlorltv  queue  cases  studied  the  associative  memory  in 
one  of  the  forms  studied  was  equal  to  or  better  than  the  random 
access  memory. 
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Conclusions  About  the  Methodol  o; 

The  methodology  used  in  the  research  attempted  to  study  the 
associative  architecture  in  a complete  way.  Each  individual  algo- 
rithm was  worked  out  and  then  combined  with  other  algorithms  to 
form  composite  node  cycles.  Total  time  measurements  were  then 
applied.  In  this  way  the  fine  structure  of  the  computational 
process  could  be  sCudted.  For  instance,  the  introduction  of  beta 
permitted  a consideration  of  the  contribution  of  non-memory 
instructions.  In  the  cases  of  queues,  beta  quickly  plays  a 
decisive  role  in  switching  the  superiority  from  the  AM  to  the  RAM 
at  alpha  less  than  one.  The  implication  is  that  perhaps  some  more 
research  is  needed  in  preparing  the  data  prior  to  search.  In 
general,  the  use  of  the  MIX  computer  approach  quickly  brought  into 
focus  the  various  areas  that  needed  attention  in  the  research. 

Compar i son  to  other  Work 

As  mentioned  at  the  outset,  there  was  no  direct  work  in  this 
area.  Vaucher  and  Duva  1 did  consider  other  RAM  algorithmic 

methods  for  priority  queues  in  discrete  simulation.  In  their  work 
thev  chose  a particular  implementation  that  did  not  use  dynamic 
allocation  but  used  a fixed  list  length.  With  some  reservations, 
however,  it  is  possible  to  make  a limited  comparison.  tor  queues 
they  found  that  the  circular  double  linked  linear  list  (CDLLL)  was 
best  for  controlling  time  growth  with  list  length  for  lengths  of 
less  than  ten  nodes.  Subsequent  techniques  yielded  better  control 
on  time  growth,  but  always  cook  more  time  with  increasing  length. 
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The  queue  techniques  for  the  associative  architecture  used  in  this 
research  were  superior  to  the  CDLLL.  further,  processing  time  is 
independent  of  list  length.  For  Vaucher  and  Duval's  priority  queues, 
priority  queue  case  la  seemed  to  be  the  closest  match.  1'his  case 
assumes  a single  node  selection.  Further  the  list  name  search 
portion  is  dropped  because  Vaucher  a*\d  Duval  used  single  lists. 

This  means  that  only  the  first  part  of  the  F1MV  algorithm  is  used. 

The  result  is  that  the  AM/ RAM/AML  is  superior  to  the  CDLLL  used  by 
Vaucher  and  Duval.  Since  all  other  algorithms  used  by  them  show  a 
positive  time  growth  with  list  length--and  the  technique  used  in 
this  research  does  not--the  associative  architecture  is  definitely 
superior  for  maximum  gamma  and  possibly  for  lesser  values  as  well. 

5.2.  General  Recommendations 

There  are  several  extensions  to  this  research  that  are  neces- 
sary to  complete  the  assessment  of  associative  architecture  for 
discrete  simulation.  The  first  of  these  concerns  the  number  of 
words  in  the  memory.  This  research  assumes  that  the  memory  was 
always  big  enough  to  contain  the  problem.  This  did  not  seem 
unreasonable  in  discrete  simulation  since  information  is  being 
created  and  destroyed  as  opposed  to  only  created  and  stored. 

Secondly  it  is  assumed  that  the  RAM  also  has  sufficient  memorv. 

An  investigation  should  be  undertaken  to  consider  memory  overflow 
for  both  type-,  of  machines  in  relation  to  each  other.  Another 
area  of  concern  is  in  the  area  of  using  non-uniform  node  size 
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dynamic  allocation  schemes.  Knuth  (_ 36 J points  out  that  t is  is  a 
very  complex  area  for  the  RAM.  A third  area  involves  an  expansion 


of  Vaucher  and  Duval’s  work  so  that  a better  comparison  can  be 
made  with  more  complex  RAM  algorithms.  A fourth  area  would  be  a 
consideration  of  a few  long  words  versus  many  short  words.  A 
preliminary  look  at  this  situation  as  part  of  the  research  indicated 
no  clear-cut  advantage  either  way,  so  for  the  sake  of  memory  com- 
patibility the  short  word  was  used.  i'he  exception  to  this  was  the 
memory  for  Lewin's  algorithm  because  of  the  manner  in  which  it  works. 

It  is  unlikely  that  research  extended  into  these  other  areas 
based  on  this  work  can  be  done  using  the  exact  same  methodology 
used  here.  Instead  it  is  recommended  that  an  emulator  be  estab- 
lished for  the  RAM  based  on  MIX  and  the  AM  based  perhaps  on  SI’ARAN. 
Following  this,  formal  simulations  of  those  parts  relatin'  to  the 
non-numeric  processine  should  be  set  up  to  measure  the  relative 
performance  of  the  machines  with  respect  to  the  three  areas  above. 

As  a final  recommendation  on  simulation  and  perhaps  the  most 
interesting,  the  merger  of  discrete  and  continuous  simulation 
should  bo  considered.  1'his  suggestion  is  based  on  the  remark 
attributed  to  Gordon  2b  ' that  the  FI'I  IFM  would  support  either  dis- 
crete or  continuous  simulation  where  \M’I  would  support  only  discrete. 
Since  the  research  demonstrated  that  an  efficient  TFM  could  be  con- 
structed by  concatenat  ing  the  FT  I and  Vi’ I methods,  it  seems  reason- 
able to  suppose  that  the  first  part  of  the  FIMV  algorithm  could 
support  continuous  simulation  while  the  total  algorithm  would  come 
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into  use  for  Che  discrete  case.  It  this  merger  were  possible, 
simulation  could  be  used  as  a single  efficient  entity  without 
regard  to  partitioning  it  into  special  categories  of  systems 
representation.  tune  straightforward  approach  would  he  to  apply 
the  methodology  of  this  research  to  (.ASP  IV  L^lJ,  a ruKTKAN  based 
simulation  system  which  permits  discrete  and  continuous  phenomena 
to  be  modeled  as  a single  system. 
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APPENDIX  A.  ASSOCIATIVE  MEMORY 


A simplified  associative  or  content  addressable  memory  (the 
terms  are  used  interchangeably)  is  shown  in  Figure  A-l.  The  method 
of  using  content  addressability  is  as  follows,  uata  encoded  in  bi- 
nary form  are  stored  in  each  memory  word  by  a load  operation  such 
that  the  type  of  information  is  vertically  aligned.  As  an  example, 
consider  billing  information,  where  the  first  M bits  of  a word  are 
reserved  for  name,  the  next  N bits  of  the  word  are  reserved  for 
address,  and  the  last  K bits  are  reserved  for  the  net  amount  owed, 
where  each  group  of  bits  is  called  a field.  To  locate  the  account 
information  of  a particular  person  the  comparand  register  is  loaded 
with  the  name  and  the  mask  register  disables  all  but  the  first  M 
bits.  An  "equal  to"  search  is  conducted  and  the  word  which  matches 
the  proper  name  has  its  response  bit  set.  The  information  can  then 
be  read  out,  updated,  et  cetera.  A random  access  memorv  would  need 
some  indirect  referencing  scheme  to  locate  the  proper  account  and 
would  generally  be  slower. 

Now  consider  a slightly  more  complex  situation  where  it  is 
desired  to  locate  all  people  who  owe  less  than  ’ dollars.  Again 
the  comparand  is  loaded  with  the  search  criterion--  ’ dol lars--and 
the  mask  register  disables  all  but  the  last  K bits.  A "less  than" 
search  is  then  conducted  on  all  words  in  parallel  and  the  responders 
are  set  for  all  correct  entries  (words).  The  same  task  on  a ran.  om 
access  memory  would  be  considerably  more  complicate  than  the  ; irst 

... 
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task.  However,  due  to  the  content  addressability  of  the  associative 


system,  this  task  is  conceptually  no  more  complicated  than  the  first, 
and  both  tasks  are  simpler  than  even  the  first  task  conducted  with 
a random  access  memory. 

Consider  now  a third  example  dealing  directly  with  discrete 
system  simulation.  A primary  activity  in  discrete  simulation  is  the 
time  sequencing  of  model  state  changes.  for  each  potential  state 
change  that  must  take  place  in  a discrete  simulation,  there  is  a 
state  change  notice  composed  of  the  following  parts,  or  fields 
(in  the  context  of  a content  addressable  memory).  The  first  field 
is  the  event  time  or  time  of  occurrence,  while  the  second  field  is 
the  event  type,  which  determines  which  subpart  of  the  model  is  to 
be  exercised  at  the  event  time.  The  remaining  fields  are  additional 
characteristics  or  attributes  that  provide  additional  information 
germane  to  that  particular  event,  such  as  event  priority,  last  time 
event  occurred,  system  resources  necessary  for  successful  event 
completion,  et  cetera.  consider  that  all  the  event  notices  are 
stored  in  the  content  addressable  memory  and  that  the  model  must 
determine  the  next  event  under  the  following  operating  rule!  find 
the  next  event  oi  type  "A"  that  possesses  an  attribute  three  (value 
of  field  3)  greater  than  "B"  and  an  attribute  five  (value  of  field  3) 
less  than  "L".  The  process  results  in  a complex  search  where  fields 
one,  three  and  five  are  set  respectively  to  A,  B and  L in  the  com- 
parand. The  mask  register  disables  al'  but  fields  one,  three  and 
five.  The  search  would  then  typically  proceed  from  left  to  right 
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across  the  memory  as  follows:  search  on  A to  isolate  words  con- 

taining only  events  of  type  A.  Then  search  on  greater  than  fl  for 
those  words  surviving  the  first  search  on  A and  and  the  result  with 
search  on  less  than  for  L.  Those  words  (event  notices)  surviving 
the  search  would  have  their  responders  set. 

This  simple  example  illustrates  the  usefulness  of  an  associ- 
ative memory  for  search  operations,  which  comprise  a large  proportion 
of  the  actual  execution  of  a discrete  simulation.  It  should  be 
pointed  out  that  an  associative  processor  can  be  considered  to  be 
an  associative  memory  with  arithmetic  capability  at  each  word. 

What  has  been  described  above  is  what  may  be  considered  an  ex- 
planation of  a conventional  associative  array  memory  such  as  STAKAA 
[_63],  although  STARA.N  is  actually  an  associative  processor.  There 
are  two  general  differences  between  S TARA a and  the  associative  archi- 
tecture used  in  this  research.  These  differences  will  be  discussed 
briefly  below. 

First,  information  is  not  stored  horizontally  by  fields  but 
vertically  in  nodes  with  one  word  allocated  for  each  field.  In  terms 
of  the  last  example,  A might  be  stored  in  the  first  word,  B in  the 
second,  and  L in  the  third.  The  search  would  begin  with  A and  all 
successes  would  be  recorded  in  the  response  store.  The  search  would 
then  continue  with  a separate  search  instruction  which  would  cause 
the  response  store  result  to  be  shifted  down  one  so  that  the  result 
of  the  search  on  A could  be  concatenated  with  the  search  on  B and 
then  shifted  again  f or  a separate  search  on  L. 
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Second,  che  search  process  takes  place  with  one  or  more  bits 
at  a time.  For  instance  suppose  each  word  were  twenty-four  bits 
long.  Then  the  search  could  proceed  from  left  to  right  (highest 
order  bit  to  lowest  order)  f bits  at  a time  where  V could  be  one, 
two,  ...,  twenty-four.  Therefore  if  t were  two,  it  would  take 
twelve  memory  Interrogations  to  complete  the  search  for  A or  b or 
L.  That  means  thirty-six  memory  interrogations  for  all  three.  The 
search  takes  place  in  a regular  parallel  fashion  which  means  that 
the  same  bits  in  each  word  are  compared  at  the  same  time  with  the 
comparand.  Gamma  can  vary  from  one  up  to  the  word  width  for  all 
comparand  searches,  that  is,  searches  where  each  ward  is  compared 
with  the  comparand  independently  of  the  others,  such  as  a search  for 
greater  than  or  less  than.  In  cases  such  as  finding  the  maximum 
or  minimum,  other  procedures  must  be  used;  and  they  are  discussed 
in  the  body  of  the  dissertation. 


APPENDIX  B.  ALGORITHMS 


B.I.  General 

This  appendix  discusses  the  various  algorithms  and  their 
associated  timings  which  were  used  to  develop  the  results  of 
Chapter  IV.  Additional  background  material  germane  to  the  use  of 
the  algorithms,  such  as  additional  conditions,  Is  also  discussed 
along  with  the  algorithms.  Each  algorithm  is  discussed  individually 
along  with  its  flow  chart.  The  MIX  documentation  3 7 J should  be 
reviewed  before  studying  the  algorithms.  Prior  to  that  there  are 
four  summary  tables  that  are  discussed  below. 

Table  B-l  tabulates  the  total  composite  node  cycle  time  for 
each  architecture  studied  for  queues.  The  four  cases  discussed  in 
the  table  are  the  same  four  cases  by  number  discussed  in  Chapters  III 
and  IV.  On  the  right  hand  side  of  the  table  are  listed  the  algorithms 
that  make  up  the  composite  node  cycle  for  that  architecture.  Each 
algorithm  so  referenced  can  be  looked  up  in  Table  B-4  for  its  indi- 
vidual timing  and  the  figure  that  giver  that  algorithm's  flow  chart. 
The  abbreviations  for  the  algorithms  are  A for  allocate,  I for  in- 
sert, S for  search,  D for  delete,  and  DA  for  deallocate;  primes 
Indicate  associative  algorithms. 

Tables  B-2  and  B-3  contain  the  same  type  of  Information  as 
Table  B-l.  Table  B-2  covers  priority  queue  cases  1,  2 and  3 and 
also  lists  the  subcases  discussed  in  Chapter  III,  which  are  again 
covered  in  more  detail  as  part  of  the  discussions  of  algorithms 
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Table  B-l.  Composite  Node  Cycle  Times  (Queues) 
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Iable  b-2.  Composite  Node  Cycle  Times  (Priority  ljueue 
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[able  15-2.  Composite  ^orle  Cycle  Times  (Priority  Queue  1,  2,  3)  (Completed) 
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'"'Indicates  delete  algorithm  not  used  since  the  node  address  is 
known  by  virtue  of  the  search  algorithm. 


Table  B-3.  Composite  Node  Cycle  Times  (Priority  (^Jeue  4,  3,  6) 
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°9’  V and  D{l*  Vhere  the  same  composite  node  cycles  apply  to 
a subcase  b or  c,  as  they  do  for  subcase  a,  the  reference  subcase 
Is  entered  In  the  composite  node  cycle  column  as  opposed  to 
relisting  the  cycle  members. 

Table  B-3  lists  the  balance  of  the  cases  for  priority  queues. 

This  table  contains  the  same  type  of  Information  as  the  previous  two 
tables. 

Additional  algorithmic  information  is  listed  in  Table  B-4.  The 
first  part  of  Table  B-4  lists  all  the  RAM  algorithms  (unprimed)  with 
the  balance  of  the  table  used  to  list  associative  algorithms  (primed). 
The  appropriate  figure  for  each  algorithm  is  given  and  the  subcases 
discussed  above  are  cross-referenced  in  the  remarks  column. 

The  flow  charts  use  standard  MIX  notation  with  the  instruction 
time  Included  in  the  lower  part  of  the  instruction  symbol. 

B.2.  MIX-RAM  Algorithms 

This  section  contains  flow  diagrams  and  descriptions  for  algo- 
rithms A,,  DAj,  I,,  I2.  I3.  Dp  D 2,  Sp  S2,  and  S3.  These  algorithms 
are  adapted  from  knuth,  Volume  I,  Sections  2.2.3  and  2.2.5  |_30j. 
Although  these  algorithms  assume  that  LLINK  and  RLINK  are  stored  in 
word  zero  (first  word)  of  the  node,  no  difference  in  timing  occurs 
if  each  link  occupies  a separate  word  in  core. 

Algorithm  A] 

This  algorithm,  shown  in  Figure  B-l,  is  designed  to  allocate  a 
node  of  a uniform  number  of  words.  The  algorithm  is  independent  of 
the  eventual  use  of  the  node.  The  algorithm  assumes  that  there  is 
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Table  B-4.  Algorithm  timings  (page  2) 
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a stack  of  available  nodes  maintained  in  a singly  linked  list  called 


AVAIL.  The  timing  corresponds  to  the  case  where  all  available  stor- 
age is  placed  in  AVAIL  at  the  outset.  It  represents  the  shorter 
time  case  and  is  the  one  that  will  be  used  in  the  comparisons. 

Algorithm  DA ^ 

Algorithm  DAj  deallocates  a node  after  it  has  been  unloaded. 
Deallocation  is  independent  of  node  use  and  simply  places  the  node 
at  the  top  of  the  AVAIL  stack.  It  is  shown  in  Figure  B-2. 

Algorithm  I_j 

Algorithm  Ij  (Figure  B-3)  inserts  a node  represented  by  regis- 
ter six  to  the  left  of  node  rlj.  This  means  that  a node  is  somehow 
determined  and  its  address  inserted  in  register  one.  The  most  com- 
mon means  of  doing  this  is  to  enter  register  one  with  a list  name  in 
the  case  of  a queue  (FIFO).  This  algorithm  then  inserts  the  current 
node  at  the  left  end  of  the  queue.  For  this  reason  the  dotted  enter 
block  is  included.  This  same  algorithm  can  be  used  for  insertion 
into  a queue  (LIFO)  by  changing  the  link  field  specifications  in 
the  instructions. 

Algorithms  _I_2  and  Ij 

Algorithm  I2  is  used  to  insert  a node  into  a priority  queue. 

The  only  difference  between  it  and  algorithm  I3  is  that  the  latter 
algorithm  includes  a priority  field  that  is  to  be  used  as  part  of 
the  sort-in  process.  The  priority  algorithm  is  shown  embedded  within 
I2  by  the  addition  of  the  dotted  steps.  These  steps  are  simply 
eliminated  for  1 2 . Another  way  to  look  at  these  algorithms  is  to 
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consider  them  an  extension  of  lj  by  the  addition  of  the  sort-in 
process. 

I2  (Figure  B-4)  inserts  the  node  in  such  a manner  that  it  two 
nodes  have  the  same  value  for  the  sort -in  key,  the  new  node  is  placed 
behind  the  list  node.  This  results  in  the  default  ranking  of  1- Ii  0. 

I 5 works  the  same  way  except  that  the  default  ranking  does  not  take 
place  until  after  the  first  ranking  on  the  sort-in  key  followed  by 
the  ranking  on  the  priority  key. 

A1 gorithros  I)  and 

Algorithm  ')  in  figure  b-3  is  used  to  delete  the  first  node  of 
a selected  list.  Ibc  only  difference  between  H j and  ^ in  . igure  li-6 
is  that  j includes  a list  empty  check,  since  D2  assumes  a prior 
search. 

A 1 gor i f intis  Sj  , S2  aru[  S j 

These  three  algorithms  comprise  the  searches  that  will  be  used 
for  comparison.  Sj  and  S2  are  shown  in  igure  B-7.  S.  is  intended 
to  find  the  first  list  member  whose  search  kev  satisfies  the  compari- 
son criterion.  The  search  is  based  on  the  priority  queue,  case  our, 
as  mentioned  earlier  under  the  discussion  about  algorithms  in  Chap- 
ter III.  S2  is  used  to  find  all  the  list  nodes  whose  search  key 
satisfies  the  comparison  criterion  and  then  *•  o tag  these  nodes.  die 
additional  instructions  for  are  shown  in  the  dottc  1 boxes, 

S.  in  igure  B-8  is  designed  t<>  find  the  maximum  or  minimum 
value  among  the  nodes  nr  a random  list  based  on  the  search  key. 

These  searches  *rr  singled  out  with  a separate  designator  because 
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as  will  be  seen  they  do  not  have  the  data  parallelism  that  and 
S2  have  in  terms  of  associative  processing. 


fl.3.  MIX-AN,  MIX-AM/KAM,  and  NIX-AM/KAM/AML  Algorithms 

This  section  describes  the  following  algorithms:  AJ,  A^,  A^  , 

I*  I*  I*  HI  nl  n*  ill  pi  ill  ' 1 1)1  i\l  ; ■ I ,1  )A*  )A 1 

S’  2’  j * l’  2’  3 ’ 4 ’ S’  6’  7’  8’  9’  10’  11’  l’  2’ 

S’,  S',  S'  and  S'.  before  individual  descriptions  of  the  algorithms 
12  3 4 

are  given  there  are  some  general  remarks  that  apply  to  all  algorithms. 

# 

Each  algorithm  assumes  the  storage  structure  discussed  in  hapter  Ii, 
figure  A,  appropriate  to  the  particular  memory  organization.  Times 
are  shown  for  each  individual  instruction,  with  the  exception  of 
Lewin's  algorithm,  as  noted  later.  The  only  difference  between  the 
instructions  used  for  these  algorithms  and  those  for  the  random  access 
memory  is  in  the  search  or  compare  commands.  The  associative  com- 
mands can  deal  with  several  words  at  a time.  They  are  followed  by 
a single  or  double  slash  depending  on  whether  the  response  store 
register  should  be  initialized  or  left  with  the  previous  result,  so 
that  the  results  of  the  next  compare  can  be  concatenated  with  the 
last  compare.  At  the  conclusion  of  the  compare  command  it  is  assumed 
that  the  address  of  the  : irst  (lowest  number  memory  location)  is 
made  available  in  index  register  one,  since  such  a comp  ire  is  ilways 
followed  by  a test  on  index  register  one  'o  see  i the  list  was 
empt  y . 


A 1 g o r i t h ms  AJ  and  A^ 

The  allocation  sememes  for  the  and  AM/ KAf  memory  configu- 
ration are  stra  i ght  forward ; however,  some  background  is  needed  on 
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the  configuration  of  the  memory  system  prior  to  the  start  of  the 
simulation.  Before  the  simulation  starts,  a one-time  memory  setup 
similar  to  that  used  for  the  random  access  memory  is  established. 

In  the  RAM  case  all  available  memory  core  was  linked  together  based 
on  the  number  of  words  necessary  for  each  node.  (frep  in  mind  that 
uniform  node  width  is  used  throughout  the  research.)  In  the  asso- 
ciative cases  a determination  is  made  as  to  how  nuny  words  per  node 
are  necessary  for  the  AM  alone  or  how  much  for  the  AM  and  how  much 
for  the  AM  and  the  RAM  in  the  combined  configuration.  A process  is 
then  undertaken  which  places  within  each  AM  node  a zero  in  the  first 
word  busy  bit  location  and  an  X (based  on  three  state  logic)  in  each 
node  word  busy  bit  location  other  than  the  first  word.  iurther, 
the  random  access  memory  node  that  is  to  serve  as  the  companion  or 
buddy  to  the  AM  node  has  the  location  (address)  of  its  first  word 
stored  in  the  RAM  node  address  field  (RNA)  of  the  appropriate  AM 
word  (see  storage  structure  in  main  body  of  text).  In  this  way  a 
search  on  the  AM  immediately  reveals  the  address  of  the  buddy  RAM 
node  in  the  combined  configurations.  This  is  done  by  the  fact  that 
the  last  step  of  anv  compare  in  the  AM  is  to  place  the  address  of 
the  first  (lowest  address)  responder  in  index  register  one  if  there 
are  any  responders  and  a minus  zero  if  there  are  none.  The  normal 
MIX  indexing  then  permits  access  to  the  RNA  since  the  AM  can  be  ad- 
dressing in  parallel  or  by  word. 

For  allocation  (algorithms  Aj,  A',  1 igure  B-d),  which  is  done 
by  a minimum  compare  on  the  busy  bit,  the  first  available  AM  node 
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address  is  placed  in  index  register  one.  An  extra  step  is  required 
to  place  the  RNA  into  index  register  two  for  the  combined  memory 
case.  During  any  initial  compare  (single  slash)  the  X states  in  the 
busy  bit  lock  out  those  words,  acting  as  a busy  bit  zero.  This  is 
also  used  in  general  throughout  all  algorithms  since  the  list  name 
field  stored  in  the  first  AM  word  is  always  queried  first,  and  this 
is  the  word  that  always  has  a zero  or  one  in  the  busy  bit  for  normal 
AM  operation.  For  a follow-on  concatenated  search  (double  slash) 
the  X states  act  as  a one  in  the  busy  bit  location.  The  balance  of 
the  algorithm  uses  standard  MIX  commands  to  store  a one  in  the  busy 
bit  location  of  the  allocated  node,  and  this  is  followed  bv  a return 
jump. 

Algori thm 

This  algorithm  is  used  to  support  the  MIX -AM/ RAM/ AML  memory 
organi zat ion.  It  is  somewhat  more  complicated  than  the  others  because 
of  the  particular  way  the  delete  algorithms  based  on  Lewin’s  algo- 
rithm operate.  "ho  following  discussion  is  based  on  figure  B-10. 

.he  first  instruction  provides  lor  the  subroutine  linkage. 

The  next  instruction  compares  register  X with  the  value  Sl^.  Consi- 
der that  there  is  a variable  called  Sf^  stored  in  register  X and 
therefore  the  compare  command  compares  ST^  to  ST2.  The  outcome  of 
the  compare  as  tested  by  the  JLL  command  determines  whether  the  new 
node  is  to  be  drawn  from  the  AM  or  the  AML.  fo  understand  the  physi- 
cal meaning  of  making  this  choice  consider  that  this  algorithm  is 
only  used  for  priority  queues  and  for  entries  of  new  schedulahle 
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state  changes  that  result  from  a state  change  selected  from  the  AML. 
ST2  represents  the  greatest  time  or  primary  key  of  any  state  change 
entry  within  the  AML,  and  ST4  represents  the  state  change  time  or 
primary  key  of  a new  state  change  spawned  by  a s*  t c change  notice 
within  the  AML  whose  state  change  time  was  less  than  or  equal  to  S 1" 2 • 
Therefore  if  ST^  is  less  than  or  equal  to  SI^  It  must  be  tiled  in 
the  AML  to  avoid  a timing  error  in  the  state  change  process. 

Assume  that  ST^  is  greater  than  ST2.  Then  a simple  allocate 
based  on  a minimum  search  of  the  busy  bit  field  (bb)  is  made,  fol- 
lowed by  a list  empty  check  based  on  the  entry  to  index  register 
one.  If  the  register  is  negative,  indicating  no  more  available 
nodes,  then  an  exit  is  made.  If  there  is  at  least  one  available 
node  the  companion  random  access  node  address  (KNA)  previously  estab- 
lished  for  each  AM  node  is  loaded  into  index  register  two  and  the 
busy  bit  is  set  to  one,  indicating  an  active  node  by  the  enter  and 
store  commands,  which  is  followed  by  a jump  return. 

In  the  alternate  case  where  ST,  is  less  than  or  equal  to  ST?, 
an  allocate  process  takes  place  within  the  AML.  At  this  point  the 
buddy  svstem  between  the  AM  and  the  auxiliary  HAM  core  can  not  be 
used  to  store  this  new  entry.  Therefore  the  AML  has  stored  within 
one  of  its  fields  a random  access  memory  node  (set  aside  for  each 
word  within  the  AML)  whose  address  is  transferred  fo  index  register 
two.  In  this  way  the  AML  functions  in  the  same  manner  as  the  A' 
with  regard  to  auxiliary  RAM  core.  The  balance  of  the  algorithm 
deals  with  setting  the  busy  bit  within  the  AML  (bbl)  followed  by 
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Che  jump  return 


Algorithms  _IJ  and  _l£ 

These  algorithms  are  shown  in  Figures  B-ll  and  B-12,  respec- 
tively. Algorithm  IJ  Is  straightforward  and  Is  used  to  Insert  the 
list  name  into  the  selected  node.  There  is  an  assumption  made  that 
not  all  operatlig  data  germane  to  a simulation  Is  included  In  the 
composite  node  cycle  timing.  Only  the  data  that  is  needed  for  node 
management  for  a particular  memory  implementation  is  included.  As 
an  example  a list  name  is  not  needed  explicitly  for  the  RAM  memory 
but  is  needed  for  the  associative  cases.  On  the  other  hand,  linkage 
information  is  needed  for  the  RAM  case  which  is  not  needed  for  the 
associative  cases. 

The  next  algorithm,  I is  a further  example  of  the  above  com- 
ments. Here  the  additional  information  of  time  of  entry  (stored  in 
register  X)  must  be  inserted  for  certain  data  cycles  involving 
queues  since  physical  location  is  used  for  the  RAM  but  the  AM 
requires  some  search  parameter. 

Algorithm 

This  algorithm,  shown  in  Figure  b-13,  is  used  to  insert  the 
additional  information  needed  to  support  the  counter  scheme  (dis- 
cussed in  Chapter  III)  for  maintaining  queues.  The  major  problem 
with  using  the  associative  memory  is  that  a search  or  compare  based 
on  a minimum  is  not  particularly  parallel  In  the  sense  that  there 
must  be  some  reconciliation  across  the  words  and  each  word  can  not  be 
treated  in  an  independent  fashion.  Feng  j^21  ] has  developed  parallel 
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minimum  searches  for  an  associative  memory,  but  they  are  based  on 
one  bit  slice  at  a time  and  therefore  take  w bit  slices  if  the  search 
field  is  w bits  wide.  One  way  out  of  this  situation  of  linearly 
increasing  search  time  with  field  search  width  is  to  attempt  to  con- 
vert from  minimum  search  to  a comparand  search  where  each  word  may 
be  treated  independently  and  processing  width  can  be  Increased  advan- 
tageously beyond  one.  This  situation  is  possible  in  queues  such  as 
the  ones  considered  here  because  they  are  "pure"  queues  in  the  sense 
that  entries  are  always  made  at  one  end  of  the  queue  and  removals 
take  place  either  at  the  same  end  or  the  opposite  end.  The  entries 
are  never  removed  from  some  intermediate  location  as  they  would 
normally  be  in  the  priority  queue.  .or  this  reason  counters  can  be 
used  to  serial  number  the  entries  as  follows.  For  the  Lite,  queue, 
a counter  maintains  the  last  serial  number  given  to  any  entry  (largest 
value).  When  the  last  entry  is  to  be  removed  (see  delete  algorithm) 
a search  (comparand)  on  equal  for  the  last  value  can  be  made  and  the 
single  proper  node  retrieved.  The  counter  is  then  incremented  or 
decremented  depending  on  whether  an  entry  is  going  in  or  coming  out. 
For  a FIFO  queue  two  counters  are  maintained,  one  for  enf  rv  and  one 
for  removal. 

This  algorithm,  as  mentioned,  adtls  the  additional  information 
to  the  associative  node  to  make  the  counter  scheme  effective.  Notice 
that  there  is  more  overhead  for  this  scheme  in  the  sense  of  more 
instructions  than  for  minimum  search;  however,  the  additional  time 
is  more  than  made  up  during  deletion  with  the  processing  width  ( K ) 
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greater  than  one.  The  algorithm  starts  with  the  subroutine  linkage 
and  then  inserts  the  list  name  information.  The  counter  is  then 
loaded  and  incremented  and  stored  in  the  proper  associative  node. 

A jump  return  completes  the  insertion  process. 

Algorithm  0 j and 

These  algorithms  are  shown  in  Figure  B- 1 . They  complement 
algorithm  1^  in  that  they  are  the  other  half  of  the  counter  scheme 
for  queues.  The  only  difference  between  them  is  that  has  the 
extra  step  for  the  AM/RAM  memory  to  load  In  the  RWA.  They  start  with 
the  subroutine  linkage  which  is  followed  by  the  selection  of  the  list 
members  by  list  name.  A test  on  list  empty  is  made,  followed  by  the 
additional  instructions  necessary  to  support  the  counter.  C is 

j 

used  to  denote  the  single  counter  in  the  LIFO  case  or  the  second 
counter  in  the  FIFO  case  (see  1^).  The  search  is  made  on  equal  for 
the  counter,  after  which  the  counter  is  decremented  in  the  LltO 
case  and  incremented  in  the  FIFO  case.  The  counter  is  then  restored 
to  memory,  the  RNA  is  loaded  if  required  and  a jump  return  is  made. 

Algorithms  0 ' , D! , D!,  O',  Dl  and  O' 

— j — u — j — O — / ~ o 

This  family  of  algorithms  is  shown  in  Figures  B-lb,  B-16  and 
B-l7.  They  are  considered  together  because  each  pair  can  be  con- 
sidered to  be  a superset  of  the  previous  pair.  Within  each  pair  the 
odd  number  is  for  the  straight  associative  case  and  the  even  for 
the  AM/ RAM  case. 

These  algorithms  form  the  basis  for  the  straight  implementation 
of  the  three  versions  of  the  priority  queue  considered  in  the 
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Figure  B-17A.  Algorithm  D’ , D* 

7 8 


166 


research 


In  the  first  case  there  Is  a single  primary  key  which  is 


used  to  select  the  minimum  node  of  the  list,  based  on  the  primary 
key.  In  the  second  and  third  cases  one  additional  key  is  added  to 
form  a joint  primary  key  with  the  first  primary  key  (PK).  In  this 
sense  the  secondary  key,  SK,  and  the  tertiary  key,  TK,  are  not  inde- 
pendent search  keys  as  in  a data  management  system.  1'heir  purpose 
in  discrete  simulation  is  to  break  ties  for  deciding  the  next  state 
change.  Both  the  SK  and  TK  may  be  thought  of  as  adding  lower  order 
bits  to  the  PK  to  increase  the  resolution  among  state  change  notices 
in  the  priority  queue.  In  general  for  discrete  simulation  PK  may 
be  thought  of  as  the  simulation  time  when  the  particular  state 
change  should  take  place,  and  the  secondary  key  may  be  thought  of 
as  a priority  key  for  tie  breaking.  TK  exists  implicitly  within 
the  RAM  structure  since  the  default  ranking  as  part  of  the  sort  in 
process  is  FIf'u.  To  maintain  this  same  capability  within  the  AM, 

TK  must  be  explicitly  present  in  the  form  of  a time  of  entry  simu- 
lation time.  Therefore  three  keys  are  needed  in  the  AM  case,  where 
only  two  are  necessary  in  the  RAM  case--four  if  the  list  name  field 
is  counted.  The  major  problem  comes  in  when  one  considers  that  the 
priority  queue  must  be  maintained  by  a minimum  search  that  increases 
linearly  with  search  width.  A possible  solution  to  that  problem 
is  discussed  later  in  conjunction  with  Lewin's  algorithm. 

All  six  algorithms  follow  the  same  pattern,  so  just  the  first 
two  will  be  discussed.  The  algorithms  start  with  the  subroutine 
linkage  followed  by  a selection  of  the  list  members  and  a list 
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empty  check.  A search  is  then  mad-  on  the  primary  key  (.note  double 


slash  for  concatenated  search  with  the  list  member  selection).  The 
first  responder  has  its  address  loaded  into  index  register  one  and 
if  required  the  RNA  is  loaded  also.  A jump  return  completes  the 
process.  The  remaining  four  algorithms  simoly  add  concatenated 
searches  for  Sl<  and  TK  respectively.  However,  at  the  end  of  every 
intermediate  search  a res1  is  made  on  the  match  indicator  to  deter- 
mine if  there  is  only  one  survivor.  This  is  done  because  it  is  to 
be  expected  within  a discrete  simulation  that  the  extra  keys  will 
not  be  needed  for  everv  data  cycle  and  it  would  be  a linear  waste 
of  time  to  use  them. 

Algorithms  D',  I.1*  and  I)' 

4 10  11 

These  algorithms  have  been  specially  designed  to  alleviate 
the  problem  mentioned  earlier  in  conjunction  with  priority  queues-- 
that  is,  the  linear  increase  in  time  with  in  reasing  search  width. 
The  algorithms  shown  in  Figure  B-18  ire  based  on  Lewin's  algorithm 
mentioned  in  chapters  III  and  IV.  Lewin's  rigor ithm  has  the  partic- 
ular property  that  one  can  guarantee  an  upper  bound  on  the  number 
of  memory  cycles  necessary  for  ordered  retrieval  regardless  or  the 
field  width,  assuming  contiguous  bits.  In  terms  oi  the  research 
this  means  that  a breakeven  point  exists  between  doing  a straight 
minimum  search  and  doing  Lewin's  algorithm.  To  utilize  lewin's 
algorithm,  however,  requires  slightly  different  thinking  from  the 
straight  node  cycle  oncept  designed  to  trace  a single  node  through 
a birth  and  death  process.  This  is  because  although  oni  can  make 
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a statement  about  retrieving  M nodes  in  order,  one  can  not  make 
the  same  statement  about  retrieving  the  first,  then  the  second, 
and  so  forth.  One  could  substitute  an  average  of  two  memory  cycles 
per  retrieval  (identification)  but  this  might  not  be  accurate 
under  certain  conditions  that  could  occur.  And  finally  there  is 
additional  programming  associated  with  various  follow-on  aspects 
to  the  selection  of  an  entry  in  the  AML  via  Lewin's  algorithm  that 
must  be  considered  and  timed.  For  these  reasons,  the  concept  of 
the  composite  node  cycle  has  been  modified  slightly  to  consider 
three  situations  in  terms  of  this  research. 

These  three  situations  can  be  described  as  follows.  The  first 
situation  is  the  normal  composite  node  cycle  where  the  initial  PK 
search  returns  only  one  node  (cases  la,  2a  and  3a  in  Table  B-4). 

The  second  situation  occurs  when  the  initial  Pk  search  returns  M 
identical  nodes  and  the  timing  listed  in  fable  B-4  for  cases  lb, 

2b  and  3b  reflects  the  amount  of  time  to  select  the  nodes.  The 
third  situation  occurs  when  the  initial  PK  search  returns  h dis- 
similar nodes  (cases  lc,  2c  and  3c  in  Table  B-4).  These  and  other 
situations  will  be  discussed  after  an  explanation  of  the  algorithms. 

These  algorithms  differ  from  previous  algorithms  in  several 
ways  beyond  those  mentioned  above.  one,  they  have  named  entry 
points  shown  in  dotted  squares.  Secondly,  they  are  longer;  and 
third,  they  do  involve  data  transfer  since  the  data  is  transferred 
between  memories.  They  start  with  the  same  preamble  as  before,  the 
subroutine  linkage  followed  by  the  list  selection  and  list  empty 


steps.  The  next  step  is  to  clear  the  flag  stored  in  index  register 
five.  This  flag  is  used  to  select  the  proper  deallocate  sequence  in 
DA^.  At  this  point  the  actual  algorithm  starts.  It  is  assumed  that 
the  actual  simulation  time,  PK,  is  maintained  in  register  X..  The 
next  step  then  Increments  register  X by  some  amount,  delta  t.  Now 
consider  before  proceeding  that  there  are  four  times  involved  with 
understanding  this  algorithm.  The  first,  Sf^,  is  the  current  simu- 
lation time.  ST2  is  the  sum  of  STj  and  delta  t,  an  arbitrary  time 
increment.  ST^  is  the  state  change  time  of  the  earliest  state  change 
lying  between  and  SI^.  And  Sf^  is  the  state  change  time  of  any 

new  state  changes  spawned  by  the  state  change  occurring  at  ST.^. 

At  this  point  in  the  algorithm  register  X contains  Sl^.  A 
search  on  less  than  or  equal  is  made  on  which  returns  as  re- 

sponders all  state  changes  occurring  in  delta  t.  If  no  responders 
are  present  this  fact  is  detected  by  the  JMIA  jump  instruction  which 
cycles  the  program  through  another  delta  t.  If  there  is  only  one 
responder,  this  is  detected  by  the  next  jump  instruction  and  this 
completes  the  instruction  sequence  for  subcase  a,  described  above. 
Subcase  a terminates  with  an  SCC1  exit,  which  indicates  to  the  TFM 
control  mechanism  that  there  is  only  one  responder.  The  timing  in 
Table  B -U  is  therefore  the  same  for  subcase  la,  2a  and  3a,  since 
the  decision  on  a single  responder  is  made  by  the  fixed  increment 
portion  of  the  FIMV  TFM,  which  is  concerned  only  with  LN  and  PK. 

If  there  are  multiple  responders  a parallel  transfer  occurs  of  the 


primary  key  field  to  the  AML  using  the  modified  MoVt.  instruction 


The  length  of  time  for  this  parallel  transfer  is  a function  of  i , 


the  transfer  width  parameter.  Parallel  transfer  is  used  in  the 
sense  that  all  bits  of  the  same  field  in  all  selected  nodes  (selec- 
ted in  the  search)  are  transferred  at  one  time.  Ac  this  point  the 
differences  among  the  three  algorithms  appear.  Additional  MOVE 
instructions  are  inserted  for  the  SK  and  the  TK  respectively  if 
they  are  present.  The  algorithm  could  be  rearranged  so  that  these 
additional  trarsfers  only  occur  if  the  PK  can  not  be  used  for 
resolution;  but  without  some  empirical  experience  with  an  actual 
problem  this  approach  seemed  best.  At  this  time  it  can  be  seen  why 
the  AML  is  configured  differently  from  the  AM  or  KAM  in  the  sense 
that  it  has  a few  very  wide  words.  The  width  is  required  so  that 
all  key  bits  can  be  contiguous  to  preserve  the  fact  that  the  re- 
trieval time  is  independent  of  the  width.  further,  additional 
information  must  be  contained  although  not  searched  in  the  form  of 
the  RNAL  and  the  AMA , the  latter  being  the  associative  memory  ad- 
dress, so  that  a reference  can  be  made  back  to  the  buddy  block  of 
the  AM  node  being  transferred.  Only  a few  words  are  needed  because 
one  does  not  expect  on  the  average  that  a very  dense  state  change 
situation  will  exist  and  if  it  did,  delta  t could  be  reduced  as 
appropriate.  As  an  aside,  it  should  also  be  pointed  out  that  in  a 
queuing  situation  with  a batch  service  discipline  where  one  would 
expect  several  state  changes  to  be  serviced  simultaneously,  it  was 
necessary  to  treat  them  separately.  Each  node  could  be  serial  num- 
bered uniquely  and  this  serial  number  can  be  added  to  any  other  key 
to  create  a non-duplicate  situation. 


After  the  completion  of  all  the  field  moves,  Levin's  algorithm 


is  started  just  past  the  second  entry  point  (FIMV  2),  The  algorithm 
proceeds  until  the  first  stop  occurs  at  the  completion  of  the  first 
partition  (see  Kolinsky  (_73j  or  Feng  [_l9j  for  a detailed  explana- 
tion), At  this  point  a list  empty  check  is  made  mainly  for  later 
cycles  using  the  second  entry  point.  For  subcase  b where  multiple 
identical  responses  occur.  Levin's  algorithm  would  detect  this  fact 
in  one  compare  cycle,  not  2M-1.  The  next  instruction  sets  the  de- 
allocate flag  to  minus  zero  to  indicate  that  there  is  at  least  one 
entry  left  in  the  AML,  The  next  step  loads  the  AMA  field  into  index 
register  two,  but  it  is  not  known  if  this  particular  selected  node 
was  spawned  by  a previous  AML  state  change  such  that  its  state  change 
time  (previous  ST^)  fell  below  ST2,  If  it  was  such  a node  then  the 
AMA  field  would  be  zero,  either  from  initial  clearing  of  the  memory 
or  by  a subsequent  deallocation  algorithm  (DA^),  Then  either  RNA 
or  RNAL  would  be  loaded.  The  last  step  before  branching  is  to  notify 
the  control  algorithm  whether  there  are  multiple  responses  within 
the  AML,  The  last  step  shown  is  the  normal  return  for  situation  one. 
In  practice  the  first  use  of  D,J,  Dj('  or  l)j  j under  subcase  c obtains 
the  first  minimum,  that  is,  the  next  state  change.  This  state  change 
is  then  processed  through  the  remainder  of  the  composite  node  cycle. 
The  TFM  then  returns  to  D^,  D^'  or  J via  the  t IMV2  entry  point  to 
select  the  next  minimum.  The  composite  node  cycles  proceed  until 
all  M nodes  are  recovered.  Therefore  Levin's  algorithm  is  timed  at 
2M-1  compares  and  each  following  instruction  is  counted  M times 
for  Table  8-4. 
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Situation  two,  M identlal  nodes,  mentioned  above,  terminates 
with  a jump  to  SCC2  while  situation  three,  M dissimilar  nodes,  jumps 
to  SCC3.  Other  situations  can  occur  and  although  they  are  not 
addressed  In  the  formal  research  because  of  complexity,  they  are 
mentioned  here  for  completeness.  (Follow-on  work  is  needed  here.) 

One  situation  can  occur  where  a node  can  be  inserted  within  the  AML 
before  it  is  emptied,  a situation  discussed  previously.  In  this  case 
Lewin's  algorithm  could  either  be  restarted  or  perhaps  simply  con- 
tinued since  it  can  be  guaranteed  that  the  new  entry  will  have  a key 
of  greater  than  or  equal  to  any  existing  key.  fhe  second  situation 
comes  up  when  there  are  some  duplicates  among  the  AML  entries. 

Based  on  Lewin's  algorithm  and  Wolinsky's  proof  u73]  the  algorithm 
could  proceed  in  the  same  manner  but  now  instead  of  partitioning 
individual  entries  it  will  partition  groups  of  equals  along  with 
groups  of  singles.  Duplicates  are  tested  for  anyway  by  the  match 
indicator  jump.  Additional  comments  regarding  the  resolution  of  mul- 
tiple responses  with  Lewin's  algorithm  can  be  found  in  Feng  |_l9~j. 

Algorithms  DAJ  and  DA j 

Algorithm  DAJ  is  simply  STZ  0,1  (bb)  which  clears  the  busy  bit 
in  the  AM.  Algorithm  DA^  is  considerably  more  complicated  and  is 
designed  to  work  in  conjunction  with  the  AML  algorithms.  ihe  algo- 
rithm is  shown  in  Figure  B-19  and  starts  out  by  setting  up  the 
subroutine  linkage.  This  is  followed  by  a flag  test  to  determine 
whether  situation  one  is  in  effect,  which  only  requires  a simple 
deallocation  of  the  AM.  Otherwise  the  AML  node  is  deallocated  and 
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AMA  is  loaded  into  index  register  two  from  the  AML.  A test  is  made 
to  determine  whether  this  address  is  zero.  Indicating  the  node  was 
spawned  into  the  AML  directly  without  being  transferred  from  the  AM. 
If  it  was  not  transferred  the  deallocation  process  is  complete.  If 
it  was  transferred,  then  the  old  node  in  the  AM  must  be  deallocated 
along  with  clearing  the  AMA  field.  The  algorithm  terminates  with  a 
jump  return. 

Search  Algorithms  S'.  Si,  S l , and  S? 

— — j — i —i  -4 

The  search  algorithms  are  set  up  in  the  same  manner  as  their 
random  access  counterparts.  The  three  options  of  find  first  (com- 
parand), find  all  (comparand)  and  find  first  minimum  or  maximum  are 
shown  in  Figure  B-20.  Although  there  are  three  options  there  are 
four  algorithms,  Sj,  S£,  S.^  and  S£.  This  occurs  as  a result  of  the 
nature  of  the  associative  architecture,  as  follows.  Search  algo- 
rithms Sj  and  are  used  for  find  first  and  find  all  for  the  AM 
and  AM/RAM  architectures  respectively.  Since  the  associative  mem- 
ory searches  all  words  in  parallel  and  has  match  indicators  which 
record  all  ties  or  equals,  the  same  algorithm  satisfies  both  the  find 
first  and  the  find  all  search  criteria.  In  the  AM/RAM  architecture, 
however,  there  is  an  extra  step  (LD2)  which  results  in  a slightly 
different  timing  for  S.^«  and  S£  are  the  algorithms  for  find  the 
minimum  or  maximum  for  the  AM  and  AM/KAM  respectively.  The  dif- 
ference for  minimum  or  maximum  lies  in  which  is  used  in  the  eMP// 
Instruction.  Minimum  is  shown  in  the  figure.  Again  S£  differs 
from  by  the  addition  of  the  U)2  command  which  alters  the  timing 
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